Mainnet Fast Catchup Stalling

I have been trying fast catch up for a mainnet node that I lit up 3 days ago. I have nuked it twice because for some reason it doesn’t catch up. The initial catch up is fast as expected, but somewhere around hour 2 it slows way down to a crawl. I have done some digging in the node.log to see if I can find any errors. There are none. There are a lot or warnings all of this type flavor:

{"callee":"github.com/algorand/go-algorand/ledger.(*CatchpointCatchupAccessorImpl).processStagingBalances.func2.1","caller":"github.com/algorand/go-algorand/ledger/catchupaccessor.go:354","file":"dbutil.go","function":"github.com/algorand/go-algorand/util/db.(*Accessor).atomic","level":"warning","line":344,"msg":"dbatomic: tx surpassed expected deadline by 3.131225743s","name":"","readonly":false,"time":"2022-02-07T08:36:02.889957-05:00"}

Surpassed expected deadline.

Not sure what is going on. I have a feeling that I may have messed up the initial node setup. I am looking to start a non archival node.

I followed these steps for a clean ubuntu install:

Added to my bashrc:

alias ALGORAND_DATA=/var/lib/algorand

Then:

sudo apt-get update
sudo apt-get install -y gnupg2 curl software-properties-common
curl -O https://releases.algorand.com/key.pub
sudo apt-key add key.pub
sudo add-apt-repository "deb [arch=amd64] https://releases.algorand.com/deb/ stable main"
sudo apt-get update
sudo apt-get install -y algorand-devtools
algod -v

The expected result algod working:

12885032963
3.2.3.stable [rel/stable] (commit #d2289a52)
go-algorand is licensed with AGPLv3.0
source code available at https://github.com/algorand/go-algorand

then (I am not sure if I was supposed to do this part):

mkdir ~/node
cd ~/node
wget https://raw.githubusercontent.com/algorand/go-algorand-doc/master/downloads/installers/update.sh
chmod 544 update.sh
./update.sh -i -c stable -p ~/node -d ~/node/data -n

Then to check operation I tried:

goal node start
pgrep algod

This returned 4478 so it is working.
I then went in and setup my config files

systemctl stop algorand
cd $ALGORAND_DATA
sudo mv config.json.example config.json
systemctl start algorand

I followed the steps under “Sync Node Network using Fast Catchup” selecting main net link.

Checking the status it appears to be working, but it ran for 11 hours and got stuck about halfway yesterday. Just restarted it this morning. Please see status below:

johnm@linuxdesktop:/var/lib/algorand$ goal node status
Last committed block: 44060
Sync Time: 92.2s
Catchpoint: 19080000#WR6MGCF2JKEKS6CMUCOCYPZQE4WSHPJPRXV5NO7GSLJYVWCO63KQ
Catchpoint total accounts: 9544326
Catchpoint accounts processed: 149504
Catchpoint accounts verified: 0
Genesis ID: mainnet-v1.0
Genesis hash: wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=

Any feedback/guidance would be greatly appreciated, especially on the setup portion. I can nuke it and try again, just want to get it up and running.

Thanks,
John

This is usually do to:

  • either too little RAM: can you check your RAM use? Do you have at least 4GB or RAM?
  • or too slow disk: an HDD will be too slow (even in RAID). A SATA SSD may be too slow. We now only recommend NVMe SSD.
1 Like

okay and thank you!

I will nuke and retry with a NVMe SSD. During the installation part of my setup, is there a need to do the:

mkdir ~/node
cd ~/node
wget https://raw.githubusercontent.com/algorand/go-algorand-doc/master/downloads/installers/update.sh
chmod 544 update.sh
./update.sh -i -c stable -p ~/node -d ~/node/data -n

part? It seems redundant if I setup with the Package Manager.

Yes, you need to either install the node via the package manager or via the above method.
Can you show us the part of the documentation that asks to install using both methods?

It doesn’t say directly. I was watching a tutorial video that I believe led me astray. So to ensure I am correct I follow the directions stated in the following sections and should get an operating node:

Overview
Installation with a package manager
Sync Node with Network
Sync Node Network using Fast Catchup

Thanks!

Just re-installed so going through the setup now!

looks like it is working now:

$ goal node status
Last committed block: 20063
Sync Time: 5.9s
Catchpoint: 19100000#QVCGM2BY4CCBTCEG4MLEXY553FT762FEWHONHZ2SLQA64UOCIAOA
Catchpoint total accounts: 9549239
Catchpoint accounts processed: 92672
Catchpoint accounts verified: 0
Genesis ID: mainnet-v1.0
Genesis hash: wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=
$ goal node status -w 1000
Last committed block: 20063
Sync Time: 26.6s
Catchpoint: 19100000#QVCGM2BY4CCBTCEG4MLEXY553FT762FEWHONHZ2SLQA64UOCIAOA
Catchpoint total accounts: 9549239
Catchpoint accounts processed: 426496
Catchpoint accounts verified: 0
Genesis ID: mainnet-v1.0
Genesis hash: wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=

leaps and bounds fast, stupid HD!

$ goal node status -w 1000
Last committed block: 20063
Sync Time: 284.1s
Catchpoint: 19100000#QVCGM2BY4CCBTCEG4MLEXY553FT762FEWHONHZ2SLQA64UOCIAOA
Catchpoint total accounts: 9549239
Catchpoint accounts processed: 4266496
Catchpoint accounts verified: 0
Genesis ID: mainnet-v1.0
Genesis hash: wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=