Fast catchup for mainnet

Trying to sync mainnet. In [Install a Node - Algorand Developer Portal] it says “minutes”; and this I can understand: 13970000#… point is less than 10000 blocks away (at the time of writing). But before processing blocks the node processes accounts: ~14 million. I also noticed that restarting the node at this stage (i.e. stop/start/restart while processing accounts) will make the node to start from scratch. Please comment.
mainnet
So, the question: is it actually possible to sync to mainnet in minutes (as in dev docs)?

I think “minutes” is a bit optimistic now that the network has grown so much.
On my computer, it takes 10min to fully sync a node via fast catchup.

Still unclear to me. 14 million / 10 minutes = 23333 accounts / second. And blocks processing on top. Does not happen in my case… in fact it takes hours to process accounts.
And how about stop/start behavior? I see that processing accounts starts from the very beginning. So 14 million accounts to be processed.

14 million blocks not accounts… the algo explorer shows 11 million accounts and 14 million blocks…

if you do the fast catchup, as fabirce has suggested, you just download the list of all 11 million accounts and last 1000 blocks… It is quite possible to do it within few minutes…

if you will want to download all history of all blocks, it takes quite sime time (days or weeks)… fabrice, is there some other way to sync relay node and indexer node than to go through all catchup phase?

I think unfortunately currently you cannot stop/start fast catchup.

@debl, regarding the fact that it takes hours for you:

  • Are you using a recent SSD? Nowadays, using a hard drive or a slow SSD (like an old SATA one) will not work due to the number of accounts on the Algorand blockchain.
  • Do you have enough RAM? You will have difficulty if you have less than 4GB of RAM.
  • Is your Internet connection good enough? You most likely want at least 100 Mbps.

@scholtz, regarding fast catchup for the full blockchain, this is currently not possible.
The full blockchain is around 500GB, so whatever happens you will need to download those 500GB and to store them in the local database, which will take some time.
I guess it would still be possible to make the process faster by having a faster mechanism to verify blocks.
Currently, all the certificates for all the blocks are verified.

What about some relay node runners once a week would publish somewhere the 500 GB database… it would be much more efficient to catchup then to download data from start and verify all transactions one by one… ?

Thank you for explanations. Indeed, I have HDD (320 GB), 3GB DDR3 RAM, and Internet is < 100Mbps :joy:

Did monitoring last night, and see that after approx. 4,5 million accounts processed - some reset happens. It all starts from beginning then:

So, despite my outdated laptop HW, what is the limiting factor here? And why reset happens? I see it is every ~2 hours, once ~4,5 million accounts been processed.

@fabrice, you said “…using a hard drive or a slow SSD (like an old SATA one) will not work…”. Why so? What is the technical reason for this?

The limitation that @fabrice was referring to originates from the transactions throughput -
In order to support 1000 transaction per second, the node would need to have the ability to update ~3000 accounts/second. As long as this is a single-peak value, the node would be able to cache the changes in memory before flushing these to the disk. However, in order to provide continues support for the above TPS, the node would need to have the ability to flush the changes to disk.

While I believe that there are HDDs that would be sufficiently performant, most of them do not. On the flip side, most of the SSD do provide sufficient disk writing performance. Taking this further, all the NVME disk drives that I have tested ( so far ), were sufficiently fast.

During the the “account processed” phase, the node is performing three tasks:

  • Loading the accounts from a network data stream
  • Decoding the account
  • Writing the accounts to disk

If I had to guess, I would make the same guess as you did. It’s most likely to be the disk that slowing thing down. In the config.json file, there is an variable MaxCatchpointDownloadDuration which controls the maximum duration the client ( i.e. your node ) would keep the connection open. The default is 2 hours, which seems to align with your observation. You could try and quadruple that to see if it would help you to pass this stage. However, please be advised that running with slow disk could trigger other issues ( that haven’t been reported/found ).

1 Like

After changing MaxCatchpointDownloadDuration from 7200 to 28800 it took ~6 hours to complete the sync. Thank you for support!

Up to now I was reading Dev Docs pages and this forum. Are there other resources (besides source code) with higher level of detail?