Testnet node crash

Not sure where they want this reported. I’m experiencing this inside the recent docker image with a hard node crash. (Around a random block) A restart “fixes” the issue but it’ll die on random blocks after a couple of minutes.

root@685172a807f8:~/node/data# cat algod-err.log
panic: sync: negative WaitGroup counter

goroutine 2220 [running]:
sync.(*WaitGroup).Add(0xc0002dc750, 0xffffffffffffffff)
/home/travis/.gimme/versions/go1.12.linux.amd64/src/sync/waitgroup.go:74 +0x135
/home/travis/.gimme/versions/go1.12.linux.amd64/src/sync/waitgroup.go:99 +0x34
github.com/algorand/go-algorand/network.(*WebsocketNetwork).Disconnect(0xc0002dc480, 0x1005e40, 0xc003134240)
/home/travis/gopath/src/github.com/algorand/go-algorand/network/wsNetwork.go:332 +0x8d
github.com/algorand/go-algorand/agreement/gossip.(*networkImpl).Disconnect(0xc000598750, 0xeaf460, 0xc002aa4b40)
/home/travis/gopath/src/github.com/algorand/go-algorand/agreement/gossip/network.go:140 +0x63
github.com/algorand/go-algorand/agreement.(*demux).tokenizeMessages.func1(0xc0004ce600, 0xc000105b00, 0xc000145ce0, 0x106b655, 0x2, 0x10cc658, 0x12f7440, 0xc000598750, 0x12f5980, 0xc00057c200)
/home/travis/gopath/src/github.com/algorand/go-algorand/agreement/demux.go:95 +0x8d2
created by github.com/algorand/go-algorand/agreement.(*demux).tokenizeMessages
/home/travis/gopath/src/github.com/algorand/go-algorand/agreement/demux.go:80 +0x10c

Looks like a bug we’ve already fixed and released. Can you confirm the build you’re running where you saw this (./algod -v)?

root@685172a807f8:~/node# ./algod -v
0.2.11.stable-telem [rel/stable] (commit #11f59b61)

Thanks for the reply - we’re on v0.2.16 now. If you run ./update.sh -d data you can upgrade to the current build. This should help you and us.

Ah I see that. Since this is in a docker container, it doesn’t seem to update properly:

root@685172a807f8:~/node# ./update.sh -d data
Current Version = 131083
Latest Version = 131088
New version found
Update Downloaded to /tmp/tmp.hOLqMDXk95/131088.tar.gz
Expanding update...
Validating update...
Starting the new update script to complete the installation...
... Resuming installation from the latest update script
Current Version = 131083
Stopping node...
... node not running
Backing up current binary files...
Backing up current data files from /root/node/data...
Installing new binary files...
Installing new data files into /root/node/data...
Copying genesis files locally
Checking for new ledger in /root/node/data
Updating genesis files for network testnet
Applying migration fixups...
Deleting existing log files in /root/node/data
Starting node in /root/node/data...
/tmp/tmp.hOLqMDXk95/a/bin/update.sh: line 393: systemd-escape: command not found
/tmp/tmp.hOLqMDXk95/a/bin/update.sh: line 393: sudo: command not found
Algorand node failed to start: node exited before we could contact it

Thanks for the details - not sure why it’s failing but haven’t tested it recently. I’ll try to get an updated image deployed soon.