Algorand Node stop synching after upgrading it to version 3.15.1

Hi @fabrice,
Algorand node status stuck in the same block height after upgrading it to 3.15.1
catchup-time is increasing constantly
{
“catchpoint”: “”,
“catchpoint-acquired-blocks”: 0,
“catchpoint-processed-accounts”: 0,
“catchpoint-processed-kvs”: 0,
“catchpoint-total-accounts”: 0,
“catchpoint-total-blocks”: 0,
“catchpoint-total-kvs”: 0,
“catchpoint-verified-accounts”: 0,
“catchpoint-verified-kvs”: 0,
“catchup-time”: 1602302324241,
“last-catchpoint”: “29110000#FI2DUXOJUME3GMA6C2TIQOV2KWT7BOKKJGQCHP7UF6JIO2XIYEKA”,
“last-round”: 29117368,
“last-version”: “GitHub - algorandfoundation/specs at 44fa607d6051730f5264526bf3c108d51f0eadb6”,
“next-version”: “GitHub - algorandfoundation/specs at 44fa607d6051730f5264526bf3c108d51f0eadb6”,
“next-version-round”: 29117369,
“next-version-supported”: true,
“stopped-at-unsupported-round”: false,
“time-since-last-round”: 0
}

node.log logs

{“file”:“logger.go”,“function”:“github.com/algorand/go-algorand/daemon/algod/api/server/lib/middlewares.(*LoggerMiddleware).handler.func1",“level”:“info”,“line”:56,“msg”:"10.213.120.116:48600 - - [2023-05-25 17:05:29.979282361 +0000 UTC m=+111184.156884083] "GET /v2/blocks/20145543?format=msgpack HTTP/1.1" 200 175972 "Go-http-client/1.1" 370.318µs”,“time”:“2023-05-25T17:05:29.979672Z”}
{“file”:“logger.go”,“function”:“github.com/algorand/go-algorand/daemon/algod/api/server/lib/middlewares.(*LoggerMiddleware).handler.func1",“level”:“info”,“line”:56,“msg”:"10.213.120.116:48600 - - [2023-05-25 17:05:31.62331739 +0000 UTC m=+111185.800919112] "GET /v2/blocks/20145544?format=msgpack HTTP/1.1" 200 95139 "Go-http-client/1.1" 358.218µs”,“time”:“2023-05-25T17:05:31.623698Z”}
{“file”:“logger.go”,“function”:“github.com/algorand/go-algorand/daemon/algod/api/server/lib/middlewares.(*LoggerMiddleware).handler.func1",“level”:“info”,“line”:56,“msg”:"10.213.120.118:59778 - - [2023-05-25 17:05:32.085508188 +0000 UTC m=+111186.263109910] "GET /v2/blocks/18167028?format=msgpack HTTP/1.1" 200 82862 "Go-http-client/1.1" 282.014µs”,“time”:“2023-05-25T17:05:32.085813Z”}
{“file”:“logger.go”,“function”:“github.com/algorand/go-algorand/daemon/algod/api/server/lib/middlewares.(*LoggerMiddleware).handler.func1",“level”:“info”,“line”:56,“msg”:"10.213.120.116:48600 - - [2023-05-25 17:05:33.857210841 +0000 UTC m=+111188.034812563] "GET /v2/blocks/20145545?format=msgpack HTTP/1.1" 200 50229 "Go-http-client/1.1" 250.112µs”,“time”:“2023-05-25T17:05:33.857482Z”}

Did you try to stop/start the node?
Did you check you’re not running out of memory?

If this does not work, can you provide all the information indicated in Node Troubleshooting - Algorand Developer Portal?
(including full node.log file, because the part you provided only shows queries you made to the node, but not errors from algod)

Hi @fabrice ,

Yes I have stopped and started the node multiple times and server is not running out of memory

with config.json:
{
“Archival”: true,
“GossipFanout”: 4,
“BaseLoggerDebugLevel”: 1,
“NetAddress”: “”,
“EndpointAddress”: “0.0.0.0:8887”,
“LogSizeLimit”: 1073741824,
“EnableMetricReporting”: false,
“EnableOutgoingNetworkMessageFiltering”: true,
“EnableIncomingMessageFilter”: false,
“CatchupParallelBlocks”: 50,
“Version”: 12,
“DNSSecurityFlags”: 0
}

getting below error repeatedly in node.log

Logging Starting
Telemetry Disabled
++++++++++++++++++++++++++++++++++++++++
{“file”:“trackerdb.go”,“function”:“github.com/algorand/go-algorand/ledger.trackerDBInitializeImpl",“level”:“warning”,“line”:123,“msg”:"trackerDBInitialize database schema version is 9, but algod supports only 6”,“name”:“”,“time”:“2023-05-30T12:43:00.435378Z”}
{“file”:“node.go”,“function”:“github.com/algorand/go-algorand/node.MakeFull",“level”:“error”,“line”:210,“msg”:"[Stack] goroutine 1 [running]:\nruntime/debug.Stack(0xc000546690, 0xc0001180b8, 0xc0002aca10)\n\truntime/debug/stack.go:24 +0x9f\ngithub.com/algorand/go-algorand/logging.logger.Errorf(0xc000546690, 0xc0001180b8, 0x1334b28, 0x21, 0xc000117ac0, 0x2, 0x2)\n\tgithub.com/algorand/go-algorand/logging/log.go:229 +0x4a\ngithub.com/algorand/go-algorand/node.MakeFull(0x1c97800, 0xc000114f60, 0x7ffda74a0f6f, 0x16, 0x100000015, 0x4, 0x0, 0x0, 0xdf8475800, 0x0, …)\n\tgithub.com/algorand/go-algorand/node/node.go:210 +0xe8e\ngithub.com/algorand/go-algorand/daemon/algod.(*Server).Initialize(0xc00061db18, 0x100000015, 0x4, 0x0, 0x0, 0xdf8475800, 0x0, 0x0, 0x1e, 0x0, …)\n\tgithub.com/algorand/go-algorand/daemon/algod/server.go:162 +0x903\nmain.run(0x0)\n\tgithub.com/algorand/go-algorand/cmd/algod/main.go:295 +0x1243\nmain.main()\n\tgithub.com/algorand/go-algorand/cmd/algod/main.go:61 +0x7b\n”,“time”:“2023-05-30T12:43:00.438043Z”}
{“file”:“node.go”,“function”:“github.com/algorand/go-algorand/node.MakeFull",“level”:“error”,“line”:210,“msg”:"Cannot initialize ledger (/var/lib/algorand/data/mainnet-v1.0/ledger): reloadLedger.loadFromDisk tracker *ledger.accountUpdates failed to loadFromDisk : Unknown field: spt”,“time”:“2023-05-30T12:43:00.438103Z”}
{“file”:“main.go”,“function”:“main.run”,“level”:“error”,“line”:298,“msg”:“[Stack] goroutine 1 [running]:\nruntime/debug.Stack(0xc000546690, 0xc0001180b8, 0xc0002acf50)\n\truntime/debug/stack.go:24 +0x9f\ngithub.com/algorand/go-algorand/logging.logger.Error(0xc000546690, 0xc0001180b8, 0xc000205410, 0x1, 0x1)\n\tgithub.com/algorand/go-algorand/logging/log.go:219 +0x4a\nmain.run(0x0)\n\tgithub.com/algorand/go-algorand/cmd/algod/main.go:298 +0x134b\nmain.main()\n\tgithub.com/algorand/go-algorand/cmd/algod/main.go:61 +0x7b\n”,“time”:“2023-05-30T12:43:00.438173Z”}
{“file”:“main.go”,“function”:“main.run”,“level”:“error”,“line”:298,“msg”:“couldn’t initialize the node: reloadLedger.loadFromDisk tracker *ledger.accountUpdates failed to loadFromDisk : Unknown field: spt”,“time”:“2023-05-30T12:43:00.438189Z”}

Unknown fields usually mean the software is outdated.
Can you provide the following information:

  • OS version
  • machine specs: number of CPU, size of RAM, disk type (NVMe SSD, SATA SSD, …)
  • actual use: memory available, disk available, …
  • algod version: algod -v
  • goal version: goal version -v
  • goal status: goal node status
  • content of config.json (in the data folder $ALGORAND_DATA)
  • link to the files algod-out.log, algod-err.log, node.log (in the data folder $ALGORAND_DATA) uploaded in a Github gist or equivalent system

Can you also double-check that you are executing the right version of algod and you don’t have two different versions (e.g., one installed with the package manager and one installed with update.sh)?

Hello Sowna,

the messages

"trackerDBInitialize database schema version is 9, but algod supports only 6..."
“couldn’t initialize the node: reloadLedger.loadFromDisk tracker *ledger.accountUpdates failed to loadFromDisk : Unknown field: spt”

indicates you are running quite old algod that does not support ledger DB you are using.

Sure @fabrice ,

OS version: Ubuntu 18.04.5 LTS
CPU: 16
RAM: 64GB
Disk Type: SSD
Disk : 8TB
Available Disk : 1.2 TB
algod version: 12885884929
3.15.1.stable [rel/stable] (commit #5f2d5ede)
go-algorand is licensed with AGPLv3.0
source code available at GitHub - algorand/go-algorand: Algorand's official implementation in Go.
goal version: 3.15.1.stable
goal node status : root@*****:/# goal node status
Last committed block: 29117368
Time since last block: 0.0s
Sync Time: 207.7s
Last consensus protocol: GitHub - algorandfoundation/specs at 44fa607d6051730f5264526bf3c108d51f0eadb6
Next consensus protocol: GitHub - algorandfoundation/specs at 44fa607d6051730f5264526bf3c108d51f0eadb6
Round for next consensus protocol: 29117369
Next consensus protocol supported: true
Last Catchpoint: 29110000#FI2DUXOJUME3GMA6C2TIQOV2KWT7BOKKJGQCHP7UF6JIO2XIYEKA
Genesis ID: mainnet-v1.0
Genesis hash: wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=

Config.json:
{
“Archival”: true,
“GossipFanout”: 4,
“BaseLoggerDebugLevel”: 1,
“NetAddress”: “”,
“EndpointAddress”: “0.0.0.0:8887”,
“LogSizeLimit”: 1073741824,
“EnableMetricReporting”: false,
“EnableOutgoingNetworkMessageFiltering”: true,
“EnableIncomingMessageFilter”: false,
“CatchupParallelBlocks”: 50,
“Version”: 12,
“DNSSecurityFlags”: 0
}
node.log:

algod-err.log:
Cannot read genesis file /var/lib/algorand/genesis.json: open /var/lib/algorand/genesis.json: no such file or directory

algod-out.log is empty

Hope this helps in narrow down the issue.

It looks like you are using the user root, which is a bit dangerous and may create issues with permissions.

It is best to run the node under a non-priviledged user.
If you install using the package manager, see the notes here: Install a node - Algorand Developer Portal

My guess is still that there are multiple installations of algod or weird permission issues.

  1. Under root, while your node is running, can you run ps auxw | grep algod and show the output? Check there is a single algod process there and take the full path. Then run /full/path/as/in/output/algod -v under root.

If the version is not 2.15.3, this is the issue.
In case algod gets immediately killed and you cannot see the full path of algod, we need to investigate another way.

  1. How do you start your node?
    (Can you specify the exact command you used under the exact user?)

  2. How did you install your node? (Package manager, update.sh, …)

For the following questions, please run the commands under the exact same user as the one you use to start/stop the node (if using goal node start) or the user under which it is run (if you use systemctl)

  1. Can you show your ALGORAND_DATA variable and the permissions there?
echo $ALGORAND_DATA
ls -lha $ALGORAND_DATA
  1. Can you run?
goal version -v
  1. Can you run?
curl -s -H "X-Algo-API-Token: $(cat $ALGORAND_DATA/algod.token)" "http://$(cat $ALGORAND_DATA/algod.net)/versions"

Hi @fabrice ,

Using dockerfile to build image

FROM debian:stable

RUN apt-get update
RUN apt-get install -y gnupg2 curl software-properties-common
RUN curl -O https://releases.algorand.com/key.pub
RUN apt-key add key.pub
RUN add-apt-repository “deb https://releases.algorand.com/deb/ stable main”
RUN apt-get update
RUN apt-get install -y algorand

VOLUME [“/var/lib/algorand”]
EXPOSE 8887 8886 4160
ENTRYPOINT [“algod”]

Then using docker compose file to bring up the node
docker-compose.yml
version: ‘2’
services:
node-algo:
image: algorand:latest
container_name: algo8887
restart: always
ports:
- “8887:8887”
- “8886:8886”
- “4160:4160”
command: [“-d=/var/lib/algorand/data”]
volumes:
- /NewDisk23Machine/node-algorand:/var/lib/algorand

This is how bring up the node

that root is going inside the container and queried for node data.

I’ve never used ENTRYPOINT ["algod"] or started algod directly using algod. I do not know if this is supported at all and it may create significant issues.

Here is an entrypoint that I use: (I don’t think it’s necessarily the best option, but can be used as a starting point)

#!/usr/bin/env bash

# update telemetry
if [ "x$TELEMETRY" = "x1" ]
 then
    diagcfg telemetry name -n "xxxx"
fi

# start node
echo "Starting node"
goal node start -H

while :
 do
    sleep 30m & wait ${!}
    node_round=$(goal node lastround)
    if [ "$?" -ne 0 ]
     then
        echo "Error getting last round from node. exiting..."
        exit 1
    fi
    echo "lastround: ${node_round}"
done

But actually, now there are official Docker images:

so you may want to use those.

Note that they also use goal node start instead of algod as entrypoint:

I run relay in K8S…

This is my start script:

AlgorandNodes/run.sh at main · scholtz/AlgorandNodes · GitHub

#!/bin/bash
if [ ! -f data/config.json ]
then
    cp mainnet/* data/ -R
fi
diagcfg telemetry enable
diagcfg metric enable
goal node start 

while true; do echo `date`; goal node status; sleep 600;done

This is my healthness probe script

check="Sync Time: 0.0s"
result=`goal node status | grep "$check"`
if [ "$result" = "$check" ]; then
   echo "0"
else
   echo "1"
   exit 1
fi

This is how i build custom docker image so that it does not run as root:

AlgorandNodes/compose-relaynode-official-3.15.1-stable.sh at main · scholtz/AlgorandNodes · GitHub

AlgorandNodes/compose-relaynode-official.dockerfile at main · scholtz/AlgorandNodes · GitHub

ARG ALGO_VER

FROM algorand/stable:$ALGO_VER
USER root
ENV DEBIAN_FRONTEND noninteractive
RUN apt update && apt dist-upgrade -y && apt install -y mc wget telnet git curl net-tools iotop atop vim dnsutils jq iproute2 && apt-get autoremove --yes && rm -rf /var/lib/{apt,dpkg,cache,log}/
ENV ALGORAND_DATA=/app/data
RUN mkdir /app
RUN mkdir /app/data
RUN mkdir /app/mainnet
RUN mv /root/node /node
RUN cp /node/genesisfiles/mainnet/genesis.json /app/mainnet/genesis.json
ENV PATH=/node:$PATH
WORKDIR /app
COPY . . 
RUN useradd -ms /bin/bash algo
RUN chown algo:algo /app -R
RUN chown algo:algo /node -R
RUN chmod 0700 /app/health.sh
RUN chmod 0700 /app/run.sh
USER algo
CMD ["/bin/bash"]

and k8s deployment:

AlgorandNodes/h1-deployment.yaml at main · scholtz/AlgorandNodes · GitHub

1 Like