Algorand Catchpoints

Hi, I wanted to ask about the catchpoints made publicly available here:

https://algorand-catchpoints.s3.us-east-2.amazonaws.com/channel/mainnet/latest.catchpoint

What is the hash/sequence of letters followed by the # sign next to the round number?

Context: Trying to generate my own catchup points.

The alpha numerics character sequence after the “#” character is the hash of the accounts database that corresponds to the round number provided before the “#” character.
If you’re running a non-archival node, you can set the CatchpointTracking in the config.json file to 1 in order to enable your node to generate the catchpoint labels.
If you want your node to generate the catchpoint files, you’ll need an archival node.

2 Likes

Catchpoint file vs labels… I’m a bit confused here, what’s the difference?

Also CatchpointTracking isn’t documented as a node option in Node configuration settings - Algorand Developer Portal

All I see is CatchpointFileHistoryLength and CatchpointInterval

  • Catchpoint label is the text “18630000#7K5346GSYMSQ7OA6IVNCRRGS5YLJPNBFUYNDQOBAQ6I4CW4OTUEQ”.
  • A catchpoint file is a large file, containing all the accounts for a given round. When you perform a fast catchup, your node downloads the catchpoint file from one of the relays and confirm it matches the provided hash.

The CatchpointFileHistoryLength is used by archival nodes and relays, allowing them to configure how many catchpoint files they would like to keep. ( since these files tend to be big, you might want to avoid keeping all of them ).

1 Like

Hi @tsachi sorry got distracted so double backing on this. Running an archival node.

In the process of running non-archival nodes, would it be possible for me to directly reference the file when trying to do fast catchup? Something like goal node catchup <path to file>

While this is an interesting idea, the node is not designed to work that way. Catchpoint files are stored under testnet-v1.0/catchpoints, and the relay node knows how to stream these to the node.
The node request a catchpoint file from the relay by providing the round number only. It uses the hash to verify the validity of the content.
All the catchpoint files generated across the network are designed to be identical.

1 Like

So the only way for a non archival node to perform fast catchup is to get the fast catchup data via a relay node? This design seems kind of limiting.

I have an archival node with the following settings:

    "CatchpointFileHistoryLength": 1,
    "CatchpointInterval": 2000

I want to be able to quickly spin up a node and fast catchup in under a minute. The 10k block intervals is too long for me, so I want a shorter interval (2k blocks instead). Is there no way for me to catchup this fast?

It seems like generating catchpoint files for anything BUT a relay node is pointless?

Admittedly, there is a way to catchup from an archival node. But you’ll need to do some tweaking:
Configure the node as a relay by setting the NetAddress field.

Once a catchpoint file is generated, you’ll be able to observe the catchpoint label in the “goal node status”.

Then, when starting the non-archival node, use the “-p < archival node gossip address >”, and perform a fast catchup from there.

I’d suggest you’ll configure the CatchpointFileHistoryLength longer than a single file.

1 Like

Great! This will work for me. Will this however automatically expose my archival node to accept incoming connections from all nodes? I could potentially restrict incoming traffic via a firewall/security group, but I’d only want my non-archival nodes to access my archival node.

No - it won’t expose your node in any way. The only way for you to “expose” is to have your relay listed as one of the SRV records. As long as that’s not the case, consider this to be your own private relay…

2 Likes

Perfect! Figured I’d also ask 2 more questions since we’re discussing catchpoints. I found this function in ledger/catchpointtracker.go

func catchpointRoundToPath(rnd basics.Round) string {
    irnd := int64(rnd) / 256
    outStr := ""
    for irnd > 0 {
        outStr = filepath.Join(outStr, fmt.Sprintf("%02x", irnd%256))
        irnd = irnd / 256
    }
    outStr = filepath.Join(outStr, strconv.FormatInt(int64(rnd), 10)+".catchpoint")
    return outStr
}

I was confused by the catchpoint folder structure, but after discussing with a colleague, it seems like you’re aiming to sort of replicate a tree structure on disk to optimize for space. Is this actually the case and why not consider a flat file structure like:

catchpoints/<round 1>.catchpoint
catchpoints/<round 2>.catchpoint

Catchpoint files are tiny (1 MB). Any sort of optimization here feels unnecessary over the simplicity of having a flat file structure.

My second question is regarding the CatchpointFileHistoryLength option. I have this option set to 1, but I am still keep older catchpoints. Why is this? Not sure if it’s a bug, but CatchpointFileHistoryLength doesn’t seem to prune older catchpoints

The reasoning for the tree-directory structure under the “catchpoints” file is that on certain file systems, storing large number of files in a single directory makes the access time slower.
This is not the case when you have several tens or hundreds of files, but it’s definitely the case when the file counts is in the thousands.

As for your other observation (Catchpoint files are tiny) - that is not really the case. The size of the file is (primarily) depends on the number of accounts stored on the blockchain. If you were to check that on mainnet, you’ll see that the catchpoint file size can be pretty large.

There used to be a small bug related to the deletion of the old catchpoint files. I believe that the bug was already fixed, and the fix is currently in master. It should be released in the coming month, but I don’t know exactly when.

3 Likes

Yep. Catchpoint files on mainnet are around 800 MB. Makes more sense here. Thanks for the insight.