I’m an ALGO holder with some questions/concerns about security of the Algorand protocol.
(1) How many relay nodes run in the Algorand network? What measures are in place to keep their physical locations a secret?
(2) How many archival nodes run on the Algorand network? Is it a very large number (I hope)?
My impression, based on Algorand node types - Algorand Developer Portal, is that only archival nodes store complete copies of the entire blockchain (as opposed to only the last 1000 blocks). While complete archived copies of the blockchain are tamper-proof with overwhelming probability, couldn’t they nonetheless be corrupted by some bad actor? Upon auditing the corrupted, archived blockchain, anybody could determine that it’s been corrupted - that’s a consequence of being “tamper-proof” - …but if many (or maybe even just some) of the archival nodes have been corrupted, mightn’t the corrupted, archived blockchain create a crisis of confidence in the entire network?
e.g. Suppose Bob is a bad actor. He manages to corrupt some of the archived blockchains.
[ One possible incentive to do so: perhaps the blockchain contains some compromising information about Bob, e.g. maybe it shows that he has lots of debt and the record of his debt would be wiped out if the blockchain simply didn’t exist. ]
Bob declares to the network that the ledger at the current moment is wrong. To address Bob’s challenge, the nodes of the network would have to audit the archived blockchain copies. But if they do that, they would discover that some of the blockchains have been tampered with and so are completely unreliable.
Now…if there are 1000 archival nodes and 1 is corrupted and the other 999 are clean, maybe the network simply decides to ignore Bob. But what if Bob managed to corrupt many archival nodes, e.g. a supermajority? Then it would be very difficult for the network to dismiss his claim…
So to summarize my question (2): what is to stop a bad actor from corrupting archival nodes? Are there many archival nodes, too many for Bob to reasonably corrupt? Are their locations secret?
Welcome to Algorand!
(1) There are around 100 relays. Their IP addresses are public because other nodes need to look up for them. So their approximate physical location is not really secret and cannot be secret. See The default list of Relay Nodes for example for how to get the list.
(2) When you query the blockchain (to get any information about past transactions / balances / …), you need to query a node. For transactions before 1000 rounds ago, the node needs to be archival. You absolutely need to trust the node you query. If you don’t want to have to trust anyone (but the Algorand software itself), the best is to run your own node and to route all your queries to your own node.
If you query any node you don’t operate, you need to trust the node provider to not tamper the answer. This is an issue for any blockchain.
From the point of view of security when querying the blockchain, since you only query a given node, the only thing that matters is that this given node is trustworthy. If there are 1000 nodes you can query and 999 are honest but 1 is not, what matters is that you query one of the honest nodes. There is no way we can prevent a malicious node provider from answering wrong information. This is the case for all blockchains.
However, note that when a node is started, it will synced to the network by asking blocks to relays (relays are archival). When it does so, it will verify each block. Algorand is designed so that forking is impossible (assuming 80% honest participation nodes). Therefore, a node that syncs up from scratch will always have the right blockchain, even if all but one relays are malicious and provide the wrong blocks. If all relays are malicious, then the node may not be able to fully synced (it may stop syncing at a block before the last block), but even then, it will not have wrong blocks inside.
So when you run your own own that you sync from scratch, you are certain that you are seeing the right blockchain (albeit maybe not the last block) even if all the relays are malicious.
Check it your self: DNS Lookup - Check All DNS Records for Any Domain
Also you might be interested in this web: https://metrics.algorand.org/
Each relay is archival node. So at least the number of relay nodes.
I hope this will change in the future as the relay function is not the same as data persistant function and the costs for these two functions should be optimized.
Btw, even if you corrupt single node and you try to fetch data from it, it will not succeed because each node verifies all data. You will simply not get the consensus on building new block so your node will stop at that point and try to fetch data little later from someone else.
1 Like
@fabrice and @scholtz: thank you both very much for your thoughtful responses!
@ fabrice:
So their approximate physical location is not really secret and cannot be secret. See The default list of Relay Nodes for example for how to get the list.
Got it.
Are the locations of archival, non-relay nodes kept secret? I could imagine that being useful in the event of a community-wide crisis of confidence. e.g. imagine bad actor Bob wants to erase record of his debt which appears early in the blockchain (way before 1000 rounds ago, say) and manages to corrupt the archived blockchain of all publicly known such archives at some block before the record of his debt. So Bob is a really bad, resourceful dude; I don’t doubt that a strong majority, I daresay all, node operators operate in good faith, but Bob is the one who worries me…
…but Bob can’t corrupt not-publicly-known archived blockchain copies. So even if Bob causes this a priori crisis of confidence by corrupting all publicly known archived blockchain copies, e.g. the archived copies stored at all relay nodes, anyone operating a secret (and correct) archived blockchain copy could broadcast their copy to the network and restore faith.
Of course, if Bob is really such a bad dude, you might then worry that Bob corrupts the restored archived copies at the relay nodes. And again. And again. And … Is there any way to prevent this sort of attack, by a terrible actor like Bob, on all blockchains archived at publicly known locations?
From the point of view of security when querying the blockchain, since you only query a given node, the only thing that matters is that this given node is trustworthy. If there are 1000 nodes you can query and 999 are honest but 1 is not, what matters is that you query one of the honest nodes. There is no way we can prevent a malicious node provider from answering wrong information. This is the case for all blockchains.
I see your point. But if Bob actually manages to corrupt every relay node, there are only 106 listed at the link scholtz sent, what then?
The scenario I’m asking about might be kind of far-fetched. And I agree these are issues every blockchain would face. But I’m trying to understand our community’s stance on this issue for our own blockchain.
@scholtz
I hope this will change in the future as the relay function is not the same as data persistant function and the costs for these two functions should be optimized.
Good point. As above, I think that secret archived copies of the blockchain could also serve a useful purpose from a security point of view.
It is technically not possible to corrupt everybody, and if someone succeed, than any new node which would want to sync from the start using official software would be stucked at some block.
And in the case that everybody would be corrupted there is no reason to use blockchain… The guys who has high stakes at algorand does not want to make algo price go zero… They would loose millions of dollars each.
To change the consensus, i think 90% of all nodes has to be upgraded and accept this new consensus to be changed.
To complement @scholtz’s response:
in the very very unlikely scenario where all relays are corrupted (which includes relays run by the Foundation and Algorand Inc who have very strong vested interests in protecting them), then the blockchain would just stall (assuming 80% of the stake is honest). There is no risk of a fork or of bad/fake transactions be included.
(This is assuming you, yourself, connect to a trustworthy node.)
Anyone with a non-corrupted archival node could start a relay and restart the blockchain by convincing people to connect to them. I believe that most node providers anyway keep regular snapshots of relays anyways. And any user needing historical data and not wanting to use public API also runs archival nodes whose locations are unknown.
So, even in the situation where all public relay nodes are dishonest (which again is extremely extremely implausible), there are many valid copies of the blockchains in many different locations in the hands of many different parties. Any one of this copy is sufficient to restart Algorand. And it is possible to check the validity of any of these copies (any tampering can be detected).