Earlier today we had an brief availability incident with the Randomness Beacon we run at the Algorand Foundation.
The backend service submitting VRF proofs was unavailable for about 23 minutes. After that, the service was restarted and the proofs were caught up within 11 minutes.
Data here:
Background
This service provides on-chain randomness for smart contracts to utilize, e.g. for lottery use cases, etc. It was in-housed in 2024 as a cost-savings measure.
It is generally very reliable, and we have several precautions against something like this happening, but today we hit an edge case.
Cause
We submit these proofs through 3 threads of a process, via 3 independent node providers. A configuration change was made that resulted in a fatal error before the independent threads spawned, which resulted in the service halting overall.
Remediation
We will be remediating this on a code level with extra guards during the process bootstrap, so as to avoid this incident cause in the future.