Node start and stop errors

Yes, it was strange because in a similar computer with archival, the node started quickly but on another one, is taking a lot.

@tsachi now responds to restapi but syncíng is too slow. Based on the logs seems some relays have problems because it gets several blocks until reaches a faulty relay and node logs a ConnectedOut event.

So, if I understand you correctly, you’re saying that the node is seeing series of faulty relays, which slows down the catchup speed ?

If so, could you name one or two of these, and I’ll see if I can fix these ?

btw - “ConnectedOut” means that the node established an outgoing connection. It’s not an indication for a faulty connection.

mm let me check better

@tsachi, log extract

{"file":"wsNetwork.go","function":"github.com/algorand/go-algorand/network.(*WebsocketNetwork).tryConnect","level":"warning","line":1814,"msg":"ws connect(ws://r-no.algorand-mainnet.network:4160/v1/mainnet-v1.0/gossip) fail: dial tcp 34.254.189.199:4160: connect: connection refused","name":"","time":"2020-08-19T19:25:39.404833Z"}
{"event":"ConnectedOut","file":"wsNetwork.go","function":"github.com/algorand/go-algorand/network.(*WebsocketNetwork).tryConnect","level":"info","line":1848,"local":"http:","msg":"Made out
going connection to peer r-ne.algorand-mainnet.network:4160","name":"","remote":"r-ne.algorand-mainnet.network:4160","time":"2020-08-19T19:25:39.409831Z"}

For 34.254.189.199 shows connection refused. Although may be any reason.

The second line “ConnectedOut” doesn’t imply of any issue. The first one, clearly does.
I’ll try to track down this one, but it will take a while. The relay might be temporary out as a result of the upgrade.

1 Like

Thanks @tsachi for the feedback.

Hi @tsachi,
I was not aware of not running the latest version. As I am on Debian I assumed the auto-update feature was functioning.
I looked closer into the warnings I had received - even with “sudo apt-get update” after I had thought I had fully removed the algorand package:

E: Repository ‘https://releases.algorand.com/deb stable InRelease’ changed its ‘Origin’ value from ‘algorand’ to ‘Algorand’
E: Repository ‘https://releases.algorand.com/deb stable InRelease’ changed its ‘Label’ value from ‘algorand’ to ‘Algorand’
N: This must be accepted explicitly before updates for this repository can be applied. See apt-secure(8) manpage for details.

It was clear that something was still stuck and the only thing I hadn’t removed was the key.pub. I looked for the key as follows:

$ sudo apt-key list
[…]
pub rsa3072 2019-06-11 [SC]
611D E94A 396F 0135 9C72 AF56 EDAC D29D A10A 4EA6
uid [ unknown] Algorand developers <dev@algorand.com>
sub rsa3072 2019-06-11 [E]
sub rsa3072 2019-11-20 [S] [expires: 2021-11-19]
sub rsa3072 2019-11-20 [E] [expires: 2021-11-19]
[…]

I again ran the commands to remove and purge the package:

$ sudo apt-get remove algorand
$ sudo apt-get purge algorand

I checked that there was no algorand related files in /etc/apt/sources.list.d (there weren’t).
I removed the two “algorand” lines from /etc/apt/sources.list:

deb https://releases.algorand.com/deb/ stable main
# deb-src https://releases.algorand.com/deb/ stable main

I then removed the algorand key (using the last 8 characters of the key from the list command above):

$ sudo apt-key del A10A4EA6

This was sufficient to fully remove all traces of algorand. Running “sudo apt-get update” ran without the previous error.

I repeated the node installation from scratch:

$ sudo apt-get update
$ sudo apt-get install -y gnupg2 curl software-properties-common
$ curl -O https://releases.algorand.com/key.pub
$ sudo apt-key add key.pub
$ sudo add-apt-repository “deb https://releases.algorand.com/deb/ stable main”
$ sudo apt-get update
$ sudo apt-get install -y algorand

And was happy to see version 2.1.3.

$ algod -v
8590000131
2.1.3.stable [rel/stable] (commit #30c8dd68)
go-algorand is licensed with AGPLv3.0
source code available at GitHub - algorand/go-algorand: Algorand's official implementation in Go.

The node had started itself automatically and began syncing. As before, it consumed ~100% CPU but my terminal remained responsive.

I put it in Fast Catchup mode and CPU utilisation fell to roughly 50% - my ancient setup became limited by disk i/o. (A combination of only having 4GByte memory so it is using around 1GBute of swap space and the fact that /var/lib/algorand points to a not very speedy external USB drive.)

I’ll post a follow-up once the catch-up process has finished.

We had mistakenly made that change to the “Origin” value and have since reverted it back to “Algorand”.

Unfortunately, yourself and others have been affected by that, and we’re very sorry for the inconvenience.

In the meantime, I was able to reproduce this error in a VM and found that the solution was the following:

  1. Remove the repo from /etc/apt/sources.list
  2. apt-get update
  3. Re-add the repo to /etc/apt/sources.list
  4. apt-get update

I see that you did document this step in one of your replies below. I found, however, that I didn’t have to remove the key.

Thanks for the confirmation. Let’s hope that not too many nodes are unable to auto-update - or at least their admins intervene quickly.

GitHub go-algorand issue #2281 Node stop and start errors. Stopped node still increasing in logs

Link preview