I did an experiment, I created a private network and made the node with 25% stake offline, this made the progress of chain stop.
I can’t seem to understand, why is this happening? All the remaining 75% stakes are of the honest nodes, so shouldn’t the committees and block still be formed?
The Algorand blockchain requires a bit less than 80% of stake honestly participating to produce blocks.
This is because the Algorand blockchain favors security/no-forking/consistency over availability/liveness.
Other blockchains make other trade-offs and allow forking in order to always be able to produce blocks: this is the case of Bitcoin for example. In Bitcoin, if the network is split in two and nodes cannot communicate between each other, each part will grow a different blockchain, and this is a fork. This can never happen on Algorand.
Note that at a high-level, if you want to support network partition / asynchronous networks and if you want no-forking, the best you can get is that 66% of the stake needs to be honest and participate for the blockchain to produce blocks.
The 80% from Algorand comes from a trade-off due to the sortition algorithm. The sortition algorithm allows Algorand to scale even with millions of nodes, allowing amazing decentralization. The cost to it is this slight loss in parameter (66% to 80% of honest stake).
Here are some more details:
- The CAP theorem essentially shows that a blockchain either is consistent (aka no-forking) or is available (i.e., always can produce blocks), assuming you want the blockchain to support network partitions, which I think is a must: Internet partitions happened in the past! See for example BGP event sends European mobile traffic through China Telecom for 2 hours | Ars Technica
- 2/3 or 66% of honest stake is necessary to prevent forking in the case network partition is possible (or, in cryptography/blockchain terms, when the network is asynchronous or partially synchronous). You can see for example Consensus cheat sheet (in particular the cell “f >= n/3” is impossible, which means tolerating more than 1/3 malicious node in term of stake is impossible - proof is here: Byzantine Agreement is impossible for $n \leq 3 f$ under partial synchrony)
But see we have 75% of honest stakes, can’t they form a committee, it’s not like that the other 25% are byzantine and will not vote, they are just offline. So the honest users (75% stake) should be able to form a committee, right? It’s not like we have all the 75% users (stake) in that committee. All the nodes in the committee will be honest, so they should be able to produce a block. Can you please clarify where am I going wrong?
Also why is there a loss in parameter (66% to 80% of honest stake)?
In the scenario that I specified, its not like that the chain’s progress is stopping, just block time is getting increased by so much which gives the impression that the chain has stopped progressing, right?
Let’s look at the situation without sortition first. This is much simpler.
In that case, if you want the blockchain to never fork (like Algorand), then:
if more than 1/3 of the nodes are not running, then the blockchain won’t produce blocks.
(I say “not running” instead of “offline” just to avoid the potential confusion with the notion of “offline accounts”.)
This is in essence what the impossibility result Byzantine Agreement is impossible for $n \leq 3 f$ under partial synchrony say. But let me reformulate it on an example.
Let’s suppose you have n = 10 nodes and stake is distributed evenly.
Let’s suppose that the attacker controls t = 3 nodes and can partition the network (as I argued above, you cannot rule this out for real-world public blockchains that use the Internet).
I am proving the following:
If the blockchain still continue producing blocks (i.e., has liveness) when t+1 nodes are not running, then the attacker can fork the blockchain.
Here is how the attacker does it.
Suppose nodes are called 1,2,…,10, and the attacker controls nodes 1,2,3.
Nodes 4,…,10 are honest.
The attacker split the network to ensure that:
nodes 4,5,6 cannot communicate with nodes 7,8,9,10.
Now, the attacker can run two blockchains that do not communicate:
chain A has honest nodes 4,5,6 and attacker nodes 1,2,3
chain B has honest nodes 7,8,9,10 and attacker nodes 1,2,3
Note that:
- The attacker nodes can act differently in chain A and chain B .
- From nodes 4,5,6 point of view, they’re just in a chain with 10 nodes, t+1=4 of which are just not running (nodes 7,8,9,10 appear not running)
- From nodes 7,8,9,10 point of view, they’re just in a chain with 10 nodes, 3 of which are just not running (nodes 4,5,6 appear not running)
Now, since the blockchain is supposed to still produce blocks when t+1 nodes are not running, chain A and chain B should produce blocks.
The attacker created a fork!
Importantly, note that this argument is completely independent of how the blockchain/consensus protocol works. The attack above works for absolutely any blockchain.
Let me rephrase it informally:
If a blockchain cannot fork when an attacker controls the network AND controls <1/3 of the stake, then the blockchain cannot produce blocks when <2/3 of the stake is running properly.
(not 100% sure about < vs <=, but this this is the gist of it)
Let’s now address the loss of parameters due to sortition.
For the Algorand blockchain to be secure you informally need 2/3 of each committee to be honest (this is a bit more complex and not completely true but this is sufficient as a first approximation). Otherwise, a fork is possible.
The issue is that an attacker controlling 1/3 of the total online Algos (those marked online in the protocol - which for other blockchains would be called “staked”), may have a bit more than 1/3 of the seats in a committee by chance.
Suppose there are 100 online accounts, 33 of which are controlled by the attacker. If the committee size was 10, the attacker may regularly get 4 seats out of the 10 and conduct an attack.
So we need to increase the parameters.
A real-world analogy is the following: committees are like polls for elections: you only poll a small number of people to then guess the election result. But there is always some errors! So unless, the result is very clear (60% candidate A, 40% candidate B in the poll), the result of the election may be different from the poll.
Polls have additional issues that people answering the poll may not do it honestly, and polling a random sample of the population is hard (because some people never answer polls).
Algorand does not have these additional issues, but still, there will be some error introduced by the sortition algorithm. These errors have been properly analyzed by the team to ensure security and liveness of the protocol.
Note that 80% is not a magic number. Actually, if I’m not mistaken (I’ve not checked in details), we could achieve any number larger than 2/3. But the closer you are to 2/3, the bigger the committees need to be and the less efficient the protocol will be.
80% is a sweet spot: it degrades the parameters not too much while providing a very efficient protocol.