Question about fast catchup vs normal catchup

javier0rosas · December 31, 2021, 10:45pm

In the developer portal, the following is stated:

Warning: Bootstrapping from a snapshot bypasses the normal node catchup procedure that cryptographically verifies the whole history of the blockchain - a procedure that is imperative to maintaining a healthy network.

Questions:

Why is normal node catchup imperative to maintaining a “healthy” network?
What is the meaning of the word “healthy” in this context?

cusma · January 1, 2022, 4:21am

Let me use a metaphor here.

Imagine being the observer of a pool table with 15 balls on it. A physicist comes, writes some equations on a blackboard and says: “Look! This set of equations, which obeys the inviolable laws of physics, describes exactly balls’ trajectories evolution on the pool table over time!”.

Given:

the initial state S_0 of the system (that is the position and velocity of each ball on pool table at t_0);
the set of equations Eqns(t) written by the physicist;

As an observer you will be able to draw balls’ trajectories over time, ending up exactly with the “state of the system” S(t) (that is the position and the velocity of each ball on the pool table at any given time t) after a delta of time dt.

Context
The balls have been rolling on the pool table for 24 hours, as an observer you want to determine the state of the system.

Scenario 1:
Starting from the known initial state S(0) you use the set of equations to calculate the trajectories step by step. This requires you quite an effort in doing the math but, at the end of all the calculations, you will be not only able to determine the state of the system after 24 hours (going from S(0) to S(24h)) but also the trajectory of any ball on pool table at any point in time in past history.

Scenario 2:
This time another observer comes saying: “Trust me, the positions and the velocities of each ball on the pool table after 23 hours were the following: …”. Let’s call this state of the system S(23h). Your effort in doing the math now is much less, since you just have to calculate 1h of trajectories’ evolution to go from S(23h) to S(24h). In this case you will not be able to know the trajectory of the balls on the pool table at any point in time in past history, since you just draw the evolution of balls’ trajectories for the last hour.

In the context of the metaphor the Scenario 2 requires you to trust the observer who gave you the state of the system after 23 hours. In the Scenario 1, on the other hand, you can draw the whole evolution of balls’ trajectories and determine the latest state of the system by yourself, without having to trust anybody other than yourself and the laws of physics.

As an observer you can choose to draw and store all the trajectories from the beginning, struggling with the math but trusting nothing but yourself and the laws of physics or you can skip some math, trusting other observers, and draw just the last part of the trajectories.

Now, let’s go outside the metaphor, replacing:

Law of physics and the set of equations → Algorand PPoS consensus protocol;
Initial state of the system S(0) → Genesis Block
Observers → Algorand nodes;
Balls on the pool table → Algorand Accounts;
Positions and velocities of the balls → Balances and local states of the Accounts;
Trajectories → Blockchain ledger;
Cue sticks strikes → Transactions;
Time → Blocks;

Context
The Blockchain has been growing for N blocks, as a node runner you want to be synchronized with the latest state of the distributed system.

Scenario 1:
Starting from the known genesis block you use the consensus protocol to verify the transactions step by step. This requires you quite an effort in doing the validation but, at the end of all the computation, you will be not only able to determine the state of the distributed system after N blocks (going from S(0) to S(N)) but also the transactions of any accounts at any time in past history.

Scenario 2:
This time another node runners comes saying: “Trust me, the state of the Blockchain ledger at block M is the following: …”. Let’s call this state of the system S(M), represented by a Fast Catchup checkpoint. Your effort in doing the validation now is much less, since you just have to verify transactions for N-M blocks to go from S(M) to S(N). In this case you will not be able to know the transactions of any accounts at any time in past history, since you just verified the transactions for the last N-M blocks.

The Scenario 2 requires you to trust the node runner who gave you the state of the distributed system after M blocks. In the Scenario 1, on the other hand, you can verify all the transactions’ history and determine the latest state of the distributed system by yourself, without having to trust anybody other than your node and the consensus protocol.

As a node runner you can choose to verify and store all the transactions from the beginning, struggling with the verification but trusting nothing but your node and the consensus protocol or you can skip some verification, trusting other node runners, and verify just the last part of the transactions.

Verifying the transactions from the beginning keeps the system healthy since it reduces the amount of mutual trust between the node runners, validating the whole blockchain multiple times, along with keeping several copies of the whole blockchain (as Archival Nodes).

javier0rosas · January 1, 2022, 7:24pm

Thank you for your well-crafted response @cusma, this cleared it all up.

Question: 100 years from now (assuming we haven’t been wiped out of existence by climate change or robots), the number of transactions that need to be verified is probably going to be astronomical. Under Scenario 1, where you need to start from the genesis block to verify the transactions step by step until S(N), I assume that would take an enormous amount of time and also a ton of data storage for Archival nodes (I haven’t done the math on this–these are just my naive assumptions). Under that scenario, is normal node catchup only going to be an option for large data centers? Might this reduce the “health” of the blockchain long term?

cusma · January 3, 2022, 2:35pm

Technology like Algorand State Proofs are going to enable a kind of “compression” of blockchain’s verification, so that even with years of ledger’s growth there will be a way to keep the catchup process efficient and decentralized without trading it for network’s health.

This being said the point related to “transactions storage space” and the “ledger history inspection” is something different and realistically sounds like not a high priority bottleneck/blocker right now. There is a plenty of margin for the current storage solutions.

In my personal view, fast-forwarding the topic, this should be addressed on the ecosystem growth level, meaning that the solutions have to be shaped by the evolution of the network itself both in terms of infrastructure and incentives to ensure data persistency and availability.

Topic		Replies	Views
Node continues normally despite fast catchup General	5	415	January 15, 2021
Fast catchup keep on restarting General	5	515	April 6, 2022
Node catchup doesn't seem work on the TestNet AlgoSDK testnet	2	500	March 23, 2022
Mainnet Fast Catchup Stalling General fast-catchup	6	403	February 7, 2022
Node have to be fully caught up in order to write transaction? TestNet fast-catchup	1	434	July 2, 2021

Question about fast catchup vs normal catchup

Related Topics