Next immediate major challenge: Blockchain Size

Congratulations to the Algorand teams (inc. and foundation) and community for all the recent exciting updates.

With the network supporting 6K TPS, given the current adoption rate, reaching 1K tx on average over the next or two years is a real possibility. There is a major issue though. The data storage (and network bandwidth) needed by archival/relay nodes will grow significantly. As of now, the current infrastructure is unable to handle constant higher transactions. There are a few bottlenecks but the major one is the size of blockchain data, which needs to be written and read from relay nodes.

One aspect of this problem is technical. Algorand Vault, State Proofs, Partitioning and other innovative technical ideas can help here. However, as the adoption rapidly increases there should be a timeline for when these capabilities are enabled. It would be great if the Algorand team provides some clarifications on their plans.

Another aspect of the problem is whether storing and replicating many of these transactions on tens of servers is economical. Probably not! Right now, it is not a concern because more transactions means more adoption, however at some point the system needs some adjustments. Current transaction fees are too low and don’t cover storage-over-time costs. Ideally with more adoption higher value use-cases replace lower value use-cases on the chain. However, there should also be mechanisms to clean up the blockchain from dead assets, transactions, etc. I haven’t seen any public comprehensive discussion on these topics. Any clarity on plans here would be helpful.

On Reddit

1 Like

I agree… we need to split the relay function of the node and archival function of the node

Also note that the most load to the algod nodes are done by the indexers… They process each new block and put all data to the postgresql… They require full archival node to operate. Indexer does take more data than algod and it is a gateway to the blockchain.

Right now indexer does not do any websocket feeds which is not optimal. RPC feeds at other blockchain are websocket based for long time.

Regarding the KMD nodes, they do not require the full archival nodes, however algorand lacks api to make the account online… Just recently i made the KMD specs with ARC0014 auth which makes this possible, however i dont see any support with this from algorand side yet.

I made public git database of the nodes - GitHub - scholtz/AlgorandPublicData: Algorand Public Data project provides public json data about algorand nodes and projects

KMD openapi specs: GitHub - scholtz/AlgorandKMDServer

Implementation to AWallet: GitHub - scholtz/wallet: Open source algorand wallet and algorand web tools - governors tools, payment gateway, ..

Disclaimer… i am one of the community relay node runners


“there should also be mechanisms to clean up the blockchain from dead assets, transactions, etc.”

all of this really boils down to the incentive structure for nodes (especially relay nodes). there is only 1 blockchain to date that claim to have solve this incentive structure

Algorand could perhap adopt similiar tech, assuming Saito’s claims is correct.

well… any editing in past blocks will result in hash corruption and inability to get to the current hash from the origin… there will always need to be some node which holds all data even removed data… so no way for algod…
for indexer it might be perhaps possible, but still there are usecases which prevent this. for example voting or auditing of the data on the blockchain… if person A votes with token T, and person B votes as well, the vote result is calculated from the blockchain data, if we decide to delete token T we still need to be able to check in 100 years that the vote results were calculated correctly… right?

This can be solved the way that the indexer might have some property where it stores only some limited number of transactions, for example last 1 mil blocks, or by customizing indexer that watch only for specific asset…

On algod side i believe the relay nodes should not act as archival nodes as these are two separate functions with different resource requirements… Relays should have fast network, and archival nodes should be able to compress the data and have ability to store on non ssd drives…

it’s up to the user to do it. you can remove assets, opt-out of contracts and clear state. you can also delete applications. We should encourage users and devs to clean after themselves

In this comment on Reddit, I wrote that someone can spend less than $400 and increase the blockchain size by 1GB. I haven’t received any response from the Algorrand team on whether there is any mechanism in place to prevent it.

Now, the question is whether someone can increase the blockchain size by 1TB or a few TBs in a few days spending a couple hundred thousands dollars. $400K or even $4M is nothing for a large-scale trader who is short Algo or a competitor who benefits from the Algo reputation damage. If this is possible, then it is a huge risk to the Algorand network. It is extremely important to ensure that Algorand continues to be a blockchain with ZERO downtime.

In this post, I predicted that a large-scale trader/institution can attack Luna and make billions. It happened a few months later!

directly in consensus parameters are limits… there is maximum bytes per block and maximum bytes for transaction note field, and perhaps some other

algorand has limit 1MB per block, which is now 3.6 seconds … so in one hour it is possible to increase blockchain size in 1GB if we get the attack

but the algod devs should probably focus on splitting old data from live data so that old data can be for example shared among several instances and does not have to be on fast drives

1 Like

With the 1MB block size limit, there will be a buffer time to appropriately react. So the maximum blockchain size increase in a year given the current parameters is ~8.6TB. Thanks for the clarification.

With the latest upgrade, block size limit is 5MB.

@awesomecrypto what you’re describing are side effects from what’s known as the (scalability) trilemma. Messing with parameters to increase block size and decrease block time seems like an easy way to scale, but it’s naive in that it comes with tradeoffs such as storage problems and more intensive resource requirements to run a node. In turn, it sacrifices both decentralization and security to a degree. Vitalik writes about it here: The Limits to Blockchain Scalability


the good news is that we just need enough archival nodes since with stateproofs it wouldn’t matter if we trust them because they’ll have to prove it

I think this is something that “vaults” were supposed to fix, but I’m not sure what happened to them. Here is a link for context link.

This article was nearly four years ago! It would be great to hear a current update on this challenging issue of blockchain bloat.