How the blocks are filling during the network stress?

Hello everyone,

I’m a PhD student in the Computer Science department at USF. My current research focuses on performance modeling and evaluation of public blockchains, specifically Algorand.

Firstly, I want to express my gratitude for this environment. There are highly technical and valuable posts here, and I have learned a lot.

Currently, I am conducting an experiment, specifically a stress test on Algorand. I am analyzing how blocks fill during periods of high traffic. My goal is to determine whether all 10,000 transactions sent to the network will be included in the next block (given its capacity for 10,000 transactions), or if they will be spread across several upcoming blocks.

I would greatly appreciate your insights and experiences in this area.

Thank you.

4 Likes

A single node likely won’t let you put enough transactions into it fast enough to fill a block like that just do to the efficiencies of the REST API, and rate limiting that will likely kick in. You’ll likely need to hit multiple nodes (either through load-balancer or explicit config), have injectors that properly uses multiple simultaneous (and in true parallel!) connections and just slam transactions in from multiple senders to multiple nodes. You need to get traffic in from lots of sources to get the mempool high enough to really fill the blocks.
It shouldn’t be too difficult.

1 Like

Thank you for your helpful reply. I would like to share my observations.

EXPERIMENT:

I signed 1000 transactions and sent all of the signed transactions to the network at one time using Algod and kmd. I did this 30 times. My code to do this is Algorand_stress_tester/stress_tester/worker.py at main · moludesmaili/Algorand_stress_tester · GitHub

OBSERVATION:

Given a transaction burst size of 1000, I find that in 80% of the cases the 1000 transactions are spread across two blocks. I observe this by using block explorer to find the transaction. Why are the transactions from a relatively small burst spread across two blocks when the block is nowhere near full in any case?

I can observe the same phenomenon for each smaller burst sizes. Is the bust being “spread out” in time due to an inherent bottleneck or some policy (e.g., enforced by a rate limiter)? If some, a pointer to the documentation would be helpful.

Thank you again.

Regards,

I don’t want to assume too much but your sending code might be a bit simplistic or you’re targeting a constrained node and not seeing it as the distributed problem that it is. If you’re expecting a single node to be able to act as an single entry-point to full network block saturation - I doubt you’ll achieve it.
Treat it like you would a DDOS problem but on multiple targets. Multiple senders, multiple targets (possibly just your own nodes but behind a load-balancer) - each sender hopefully properly using multiple (fully async and truly parallel) connections to submit transactions. Hopefully you’re not doing send/wait for confirmations and a batch of 1000 of those as your batch ?

Configuration settings on the node should be documented here:

3 Likes

You will almost certainly need more than a single node for this, but at the results you are seeing, some of the delays are very likely on the client side.

Check out the methodology section here. My approach was:

  • sign everything in advance (minding the last valid rounds, transactions do expire)
  • save to files in batches equal to your number of nodes
  • transfer the transaction files to your nodes
  • submit them with the goal cli (goal clerk rawsend)

this will remove any delays from your end. For example, signing 1000 transactions does not take zero time, even if you are doing it multithreaded (and languages like javascript and python will not be multithreaded by default.)

By getting the files in place on your node(s) in advance, you are excluding delays in opening/closing HTTP connections, your own bandwidth/latency/etc

These factors are likely part of what is limiting your results, which have to do with your methodology and not the network’s capabilities. The other aspect, as Patrick says, has to do with how many nodes on the network are broadcasting all those transactions. Broadcasting from a single node is not a realistic test of actual throughput, so using multiples is recommended.

1 Like

Patrick, Thank you so much for your response!

In another phase of my experiment, I have already followed your suggestion precisely. I deployed 16 nodes on 16 AWS servers, with each node sending 5000 transactions, totaling 80,000 transactions to the network. However, during this stress test, the issue I previously mentioned worsened. Upon tracking the sent transactions using block explorers, I observed that they were randomly spread across more than 10 non-consecutive blocks. This leads me to believe that there may be a performance bug within the Algorand network. Despite a significant burst of transactions, exceeding three times the block capacity at 25,000 transactions, they were distributed across more than 10 blocks instead of consolidating into 3-4 blocks. Additionally, I noted a maximum Transaction confirmation time of 84 seconds for the transactions, which significantly exceeds our expectations for Algorand’s Transaction confirmation time.

I greatly appreciate your insights on this matter.

I appreciate your helpful response; it was wise!

As evident from my code, Algorand_stress_tester/stress_tester/worker.py at main · moludesmaili/Algorand_stress_tester · GitHub I precisely implemented the logic you suggested. Within my code, the process involves signing, saving, and subsequently sending all 1000 transactions in a batch simultaneously to the network. Furthermore, the code records the time taken for the entire transaction batch to be sent and received by the network. However, despite this logical approach, even with a small burst of 1000 transactions, they seem to have spread across more than one block. Notably, during stress testing where 80,000 transactions were sent using 16 AWS servers, the transactions were observed to spread across more than two blocks, approximately 10 non-consecutive blocks.

Your insights on this matter are invaluable. Thank you once again.

You are welcome. You missed this part:

  • transfer the transaction files to your nodes
  • submit them with the goal cli (goal clerk rawsend)

Your code still submits each transaction individually over the HTTP API. Furthermore, I think it is doing so sequentially. While you have some multiprocess “plumbing” there, I think you are still only spawning a single process to send these

In order to get them broadcast as tightly packed as possible, you should dump the transactions in binary format to a file, then transfer the file to a node (or break them up into multiple files and use multiple nodes) and send them all in a batch directly from the node, using this command:

goal clerk rawsend -Nf mytxns.dat

(the -N flag instructs not to wait for confirmations)

I’d try this approach with a single txn in a file (just to see if your process is working overall) and then split them up in 8 batches and try again from 8 nodes.

Another thing to keep in mind is that the interval that you broadcast (start → stop of txn broadcast) will not line up perfectly with the block production interval, so if it takes you 1.5 seconds to broadcast all of them (start to finish), you would likely hit a “block boundary” and have some included in the next block, and the rest in the +1 block. If you broadcast a) fast enough and b) enough txns to fill 2-3 blocks, then this effect should not be present.

Filling blocks is quite hard (mostly limited by client/setup) which is why for my AMM test, I had to cut out the HTTP calls entirely and broadcast directly from multiple nodes on the network.

FYI for the txn types you are using, it looks like the theoretical limit at the moment is around 7400 TPS. See this twitter post from today. (see next post) The 10K TPS figure is possible when inner transactions are utilized.

Finally, if you exhaust every other reason for this, note that the testnet infrastructure (relays, consensus nodes, etc) may not necessarily be as performant as mainnet.

2 Likes

The previous comment had incorrect calculations for max outer txn TPS, I think this is correct:

And in case it helps with your research, these are the largest blocks by transaction count at the moment:

rnd,txn_count
25836244,58888
25808869,51895
25836565,45323
25990224,44766
25990231,41958
25990219,40684
25990223,40175
25990221,36031
25808868,33952
25990238,32204
26101804,31250
26101805,31241
25990241,29077
25990237,27760
33909471,27451
34801062,27311
34801131,27236
33909213,27078
34801056,26754
34801032,26663
26101803,26418
25990222,26260
34801158,26246
34801114,26243
23593602,26197
1 Like

Thank you so much for the information you provided. You mentioned some very interesting points. I apologize for the delay in response, as I needed to recheck my results and my code based on your information. As you mentioned, the ‘goal clerk rawsend -Nf mytxns.dat’ command will be much faster than the method I was using. Because of that, I faced a 5-second delay in sending all the 80,000 transactions to goal and submitting them. However, I believe that after the 5-second delay (the delay of signing, sending to goal, and submission), all the transactions were received by the first node in the network, correct? So, the method you mentioned can only improve the delay to less than 5 seconds. But the delay I faced was much more than 5 seconds; for some transactions, it took 84 seconds from the time I received the transaction submission confirmation until block inclusion.

The other point is that, as you mentioned, the transactions were sent very sequentially. However, upon checking block explorers, I found that some of the first created transactions were confirmed very late.

Overall, the main question on my mind is mostly about the transaction confirmation time, not just filling the blocks. Why would it take more than 1 minute for some transactions to be confirmed and included in one block? In this regard, I believe that not filling the block is key to answering that question. After exploring all the results, my current assumption is that the gossip protocol is not working fast enough to synchronize all the nodes and broadcast all the pending transactions to all the nodes. I look forward to hearing from you if you have any ideas in this regard.

Yes, as you mentioned, the testnet may not perform as well as the mainnet, but it is the best representation we have.

The data you provided on the largest block is very helpful for my research. I really appreciate it!

Thank you again, and I look forward to learning more from you in this regard.

2 Likes

Thank you once again for providing this information. I have two concerns regarding the data you provided. While I’m searching on Algorand block explorer, Dappflow, the block sizes do not match the ones you sent me:

rnd,txn_count
25836244, 264
25808869, 295
25836565, 395
25990224, 222
25990231, 230
25990219, 234
25990223, 239
25990221, 189
25808868, 196
25990238, 204
26101804, 139
26101805, 129
25990241, 149
25990237, 112
33909471, 27451
34801062, 27311
34801131,27236
33909213,27078
34801056,26754
34801032,26663
26101803, 148
25990222, 148
34801158,26246
34801114,26243
23593602,26197

Is there any issue with this block explorer?

Additionally, how is it possible for an Algorand block to include more than 25,000 transactions? Isn’t there a fixed limit for the block size? Could you explain how the block size functions in Algorand?

2 Likes

Hello @d13co . Thank you again for the solution you provided me. It helped a lot! I would like to share my experience with you.

When using goal clerk rawsend (goal clerk send) for sending transactions, any batch of 1000 or fewer transactions always appeared in one block. When I scaled up to 10,000 transactions, my batch was spread across four non-consecutive blocks in some cases. As one example, here are the blocks that included the transactions sent by me in one batch of 10,000:

40696332 : 9650 txns.

40696333 : 334 txns.

40696334 : 5 txns.

40696336 : 20 txns.

Some of my transactions experienced a 13-second transaction confirmation time (response time). I cannot figure out why Algorand does not fit all of the 10,000 transactions into one block, also why the blocks are non-consecutive. Where might the bottleneck be? Might there a bottleneck somewhere before the block proposal assembly phase?

I am watching: https://www.youtube.com/watch?v=hbT2SmrouIA (and notably the figure at time 9:44) to better understand Algorand performance.

I am happy to share my code and any findings that I have. My goal is to build a queueing model of Algorand.

Thank you again for your insights.

3 Likes