TestNet stalled?

#1

Hi,

looks like there’s some problem with TestNet. Last block is 429967 and is stalled for last 15 minutes or so (Time since last block: 1093.7s). Is there network reset in progress or something else? Thanks in advance!

Regards,
Ivica

#2

OK, it seams that it started again after more than a hour. Thanks for fixing!

Regards,
Ivica

#3

Thanks for checking on this Ivica, and I’m sorry nobody responded sooner. There have been 2 stalls since the last update on Dec 12. Both were caused by out-of-disk status on one of the VMs (from not clearing out old builds and databases), and were corrected once the stall was detected. The network recovered nicely as expected.

Thanks again for reporting this and for participating in TestNet!

-david

#4

Hey David!

First of all thanks for your response, really appreciate it! I have couple of questions about testnet and this incident…

Network stalls is not a big deal, after all I guess test networks are there to figure out all possible problems before main net. Thing that I don’t understand is how can one down node cause whole network to stop? I see that there are like 20 nodes in network, but most of those are probably just “nodes” that sync blockchain from relayers and does not participate in consensus?

Regarding that, is there a way to make my node participate in consensus (get elected)? It’s been up for a while and I never got any block rewards. I couldn’t find in available docs how to stake some algos and make my node run as validator? Is that even possible right now?

And yes, what does account “online/offline” status mean? I’ve generated participation key and made one of the accounts online, but not sure is that enough for my node to be elected or it have nothing to do with it?

Looking forward for your answers and thanks in advance!!!

Regards,
Ivica

#5

Thanks for the follow-up questions, Ivica -

TestNet currently has its initial stake divided up into 21 wallets. The goal was to disperse sufficiently across nodes we could reasonably ensure would be online to avoid a single point of failure. This goal was compromised out of necessity and resulted in putting a slightly-too-large stake on a node that I directly control.

In the coming weeks, we’ll be releasing a new build and increasing the distribution across more nodes, so this single-point-of-failure will be eliminated.

is there a way to make my node participate in consensus

Yes, by having more stake you will be selected more often. With 1% of the stake you can expect to be selected about 1% of the time. If you are interested in having a larger stake and expect to have your node generally online, send me your public key (account ID) and I’ll send you a big enough chunk to be selected at least once an hour.

And yes, what does account “online/offline” status mean

Online means your account will be considered as available to contribute toward consensus - your stake is counted as part of the entire voting stake and will be included when selection is done for proposers/committees. To actually participate, your account needs to be running on a node connected to the network, and you need to have a participation key that is valid. Offline means you don’t intend to host your account on a connected node and it shouldn’t be counted as part of the voting stake.

Note that Rewards are currently disabled and the implementation is being finalized, so there are no ‘block rewards’ right now. You’ll need to check the blockchain history to know if you proposed any winning blocks.

If you marked an account online with valid participation keys, then you just need more stake to be selected in a reasonable amount of time.

Hope this helps.

-david

#6

There is a single point of failure because one (or more?) node has too much stake? What if this happens in the real world / mainnet?

Besides, how can a decentralized, scalable, secure infrastructure have a single point of failure?

#7

This happened because we haven’t built out our TestNet network enough to spread the stake out, and reasons caused me to put too much stake on one node. In ‘the real world’ we will not have significant stake on a single node. Our stake will be widely distributed to ensure no single point of failure, and we’ll depend on sufficient distribution of the bulk of the stake to ensure it keeps running.

Our infrastructure has a single point of failure when we’re still in development. TestNet is not representative of the MainNet infrastructure, at least not yet.

In the coming days and weeks we will be building out the network; it may even be entirely replaced with a new topology and stake distribution.

#8

“too much stake on one node”

  • was this more than 1/3rd of the total stake?
#9

It was enough of total online stake, combined with other nodes with online stake that weren’t currently running.

#10

Thanks @David

Unfortunately, this is a bug in Algorand, as submitted by me in this medium post. https://medium.com/@rajeshbhaskar/algorand-may-be-broken-d1d2c2542064. This bug cannot be fixed as per the current design of Algorand and needs a new approach which I have outlined in the article.

Can you take this up as a bug submission? Do you have a bounty program?

Thanks/Rajesh

#11

I don’t appreciate you misrepresenting a test environment configuration as indicative of, and proof for, some fundamental flaw in our approach. TestNet is not expected to be resilient yet and is not intended to represent even an early incarnation of MainNet. To call the ‘single point of failure’ any sort of proof is ludicrous and not helpful.

#12

Hi @David, this issue with Algorand was diagnosed by me months back and not specific to your testnet. So there is no case of misrepresentation here. I reached out to your team back then as well.

It certainly appears that the testnet is cropping up with the issue as outlined in that post. The flaw is clearly with nodes having sufficient stake going offline.

Do you believe that this is not a bug with Algorand?