I am also experiencing a deadlock on my MainNet archival node. I build from source and periodically stop/start my nodes. Today I noticed my MainNet node was stopped and both TestNet and BetaNet nodes were operational. No worries, it will catch up, I thought. I built master
and stopped the other two nodes. They both restarted fine and are operational (however, those are not archival). The MainNet node produced the following (truncated) node.log
file:
{"file":"fetcher.go","function":"github.com/algorand/go-algorand/catchup.(*NetworkFetcher).FetchBlock","level":"info","line":219,"msg":"networkFetcher.FetchBlock: asking client r-si.algorand-mainnet.network:4160 for block 8191384","time":"2020-08-06T18:17:04.733513Z"}
{"file":"fetcher.go","function":"github.com/algorand/go-algorand/catchup.(*NetworkFetcher).FetchBlock","level":"info","line":219,"msg":"networkFetcher.FetchBlock: asking client r-cs.algorand-mainnet.network:4160 for block 8191385","time":"2020-08-06T18:17:05.271563Z"}
{"Context":"Agreement","Hash":"","ObjectPeriod":0,"ObjectRound":0,"ObjectStep":0,"Period":0,"Round":8191369,"Sender":"","Step":0,"Type":"RoundInterrupted","Weight":0,"WeightTotal":0,"file":"demux.go","function":"github.com/algorand/go-algorand/agreement.(*demux).next","level":"info","line":285,"msg":"agreement: round 8191369 ended early due to concurrent write;$
{"file":"fetcher.go","function":"github.com/algorand/go-algorand/catchup.(*NetworkFetcher).FetchBlock","level":"info","line":219,"msg":"networkFetcher.FetchBlock: asking client r-fr.algorand-mainnet.network:4160 for block 8191386","time":"2020-08-06T18:17:05.522169Z"}
{"Context":"Agreement","Hash":"","ObjectPeriod":0,"ObjectRound":0,"ObjectStep":0,"Period":0,"Round":8191370,"Sender":"","Step":0,"Type":"RoundInterrupted","Weight":0,"WeightTotal":0,"file":"demux.go","function":"github.com/algorand/go-algorand/agreement.(*demux).next","level":"info","line":285,"msg":"agreement: round 8191370 ended early due to concurrent write;$
{"file":"fetcher.go","function":"github.com/algorand/go-algorand/catchup.(*NetworkFetcher).FetchBlock","level":"info","line":219,"msg":"networkFetcher.FetchBlock: asking client r13.algorand-mainnet.network:80 for block 8191387","time":"2020-08-06T18:17:05.545599Z"}
{"file":"deadlockLogger.go","function":"github.com/algorand/go-algorand/daemon/algod.(*dumpLogger).dump","level":"error","line":36,"msg":"[Stack] goroutine 66 [running]:\nruntime/debug.Stack(0xc000080820, 0xc000010570, 0xc00a940780)\n\truntime/debug/stack.go:24 +0x9f\ngithub.com/algorand/go-algorand/logging.logger.Error(0xc000080820, 0xc000010570, 0xc00a8cfc70,$
{"file":"deadlockLogger.go","function":"github.com/algorand/go-algorand/daemon/algod.(*dumpLogger).dump","level":"error","line":36,"msg":"POTENTIAL DEADLOCK:\nPrevious place where the lock was grabbed\ngoroutine 86 lock 0xc000259788\ngithub.com/algorand/go-algorand/ledger/ledger.go:333 ledger.(*Ledger).notifyCommit ??? \u003c\u003c\u003c\u003c\u003c\ngithub.com$
{"file":"deadlockLogger.go","function":"github.com/algorand/go-algorand/daemon/algod.setupDeadlockLogger.func1","level":"error","line":57,"msg":"[Stack] goroutine 66 [running]:\nruntime/debug.Stack(0xc000080820, 0xc000010570, 0xc00a941400)\n\truntime/debug/stack.go:24 +0x9f\ngithub.com/algorand/go-algorand/logging.logger.Panic(0xc000080820, 0xc000010570, 0xc00a$
{"file":"deadlockLogger.go","function":"github.com/algorand/go-algorand/daemon/algod.setupDeadlockLogger.func1","level":"panic","line":57,"msg":"potential deadlock detected","time":"2020-08-06T18:17:35.564844Z"}
algod-err.log
:
goroutine 8320 [select]:
net/http.(*persistConn).writeLoop(0xc00524dc20)
net/http/transport.go:2277 +0x11e
created by net/http.(*Transport).dialConn
net/http/transport.go:1648 +0xc7d
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^$
panic: (*logrus.Entry) (0x1a30dc0,0xc00a941630) [recovered]
panic: (*logrus.Entry) (0x1a30dc0,0xc00a941630)
goroutine 66 [running]:
github.com/algorand/go-algorand/logging.logger.Panic.func1(0xc000080820, 0xc000010570)
github.com/algorand/go-algorand/logging/log.go:256 +0x89
panic(0x1a30dc0, 0xc00a941630)
runtime/panic.go:969 +0x16a
github.com/sirupsen/logrus.Entry.log(0xc000076ea0, 0xc00a9682a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
github.com/sirupsen/logrus@v1.0.5/entry.go:112 +0x2f5
github.com/sirupsen/logrus.(*Entry).Panic(0xc00a9415e0, 0xc00a9720d0, 0x1, 0x1)
github.com/sirupsen/logrus@v1.0.5/entry.go:182 +0x104
github.com/algorand/go-algorand/logging.logger.Panic(0xc000080820, 0xc000010570, 0xc00a9720d0, 0x1, 0x1)
github.com/algorand/go-algorand/logging/log.go:260 +0x170
github.com/algorand/go-algorand/daemon/algod.setupDeadlockLogger.func1()
github.com/algorand/go-algorand/daemon/algod/deadlockLogger.go:57 +0x24a
github.com/algorand/go-deadlock.lock(0xc00548e060, 0x1a85920, 0xc000259788, 0x1)
github.com/algorand/go-deadlock@v0.2.1/deadlock.go:252 +0x833
github.com/algorand/go-deadlock.(*RWMutex).RLock(0xc000259788)
github.com/algorand/go-deadlock@v0.2.1/deadlock.go:152 +0xc1
github.com/algorand/go-algorand/ledger.(*Ledger).Lookup(0xc000259000, 0x7cfc4d, 0xa32dc9e980015614, 0x306e3572c8a31a17, 0xc14ba7c4967feaa8, 0x8fcb949c92685a75, 0x0, 0x0, 0x0, 0x0, ...)
github.com/algorand/go-algorand/ledger/ledger.go:392 +0x9b
github.com/algorand/go-algorand/data.(*Ledger).BalanceRecord(0xc000263060, 0x7cfc4d, 0xa32dc9e980015614, 0x306e3572c8a31a17, 0xc14ba7c4967feaa8, 0x8fcb949c92685a75, 0x0, 0x0, 0x0, 0x0, ...)
github.com/algorand/go-algorand/data/ledger.go:191 +0x9d
github.com/algorand/go-algorand/agreement.membership(0x1aae980, 0xc000263060, 0xa32dc9e980015614, 0x306e3572c8a31a17, 0xc14ba7c4967feaa8, 0x8fcb949c92685a75, 0x7cfd8d, 0x0, 0x2, 0x0, ...)
github.com/algorand/go-algorand/agreement/selector.go:67 +0x18d
github.com/algorand/go-algorand/agreement.unauthenticatedVote.verify(0xa32dc9e980015614, 0x306e3572c8a31a17, 0xc14ba7c4967feaa8, 0x8fcb949c92685a75, 0x7cfd8d, 0x0, 0x2, 0x0, 0x7649dd7a1025ef2c, 0x6def3595f4d49cdb, ...)
github.com/algorand/go-algorand/agreement/vote.go:91 +0xf6
github.com/algorand/go-algorand/agreement.(*AsyncVoteVerifier).executeVoteVerification(0xc0018b2050, 0x19abfe0, 0xc0052b5000, 0x2, 0x2)
github.com/algorand/go-algorand/agreement/asyncVoteVerifier.go:105 +0x26b
github.com/algorand/go-algorand/util/execpool.(*pool).worker(0xc0001cdd40)
github.com/algorand/go-algorand/util/execpool/pool.go:159 +0x122
created by github.com/algorand/go-algorand/util/execpool.MakePool
github.com/algorand/go-algorand/util/execpool/pool.go:85 +0x172```