Ubuntu permissions problems

Hi,

I’m setting up on Ubuntu 18.04 and the installation went smoothly - the “Installing with Debian” instructions work like a treat. algod fired up and the mainnet started to syncrhonise. I stopped the syncrhonisation using ‘goal node stop -d /var/lib/algorand’ and it all looked good.

Following the instructions for “Switch Networks - Debian or RPM installs” also went like a breeze and algod fired up again and the testnet started to synchronise. Again, I stopped the syncrhonisation and everything looked good.

For good measure I started and stoped the mainnet and testnet processes to check synchronisation continued OK. I also rebooted to check that the processes resumed automatically. Everything still looked good.

My machine does not have the largest of disks and I don’t relish the prospect of filling up the /var parition and so the next step was to move /var/lib/algorand to another partition. I copied the directory structure using ‘sudo cp -rp’ into a directory named ‘algorand’ I’d previously created with appropriate permissions:
drwxr-xr-x 4 algorand algorand 4096 Jul 22 08:37 /home/algorand

I double-checked that the recursive copy had maintained the correct permissions (note I’ve touched the directory and genesis.json since then so the timestamps have changed):
drwxr-xr-x 4 algorand algorand 4096 Jul 23 10:42 /home/algorand/algorand

-rw-r–r-- 1 algorand algorand 714886888 Jul 22 08:34 agreement.cdv
-rw-r–r-- 1 algorand algorand 1073742006 Jul 22 06:48 agreement.cdv.archive
-rw-r–r-- 1 algorand algorand 64 Jul 20 14:10 algod.admin.token
-rw------- 1 algorand algorand 0 Jul 20 14:10 algod.lock
-rw-r–r-- 1 algorand algorand 64 Jul 20 14:10 algod.token
-rw-r–r-- 1 algorand algorand 1996 Jun 25 18:14 config.json.example
drwxrwxr-x 6 algorand algorand 4096 Jul 20 14:10 genesis
-rw-r–r-- 1 algorand algorand 24973 Jul 23 10:42 genesis.json
drwx------ 2 algorand algorand 4096 Jul 20 14:10 mainnet-v1.0
-rw-r–r-- 1 algorand algorand 73431497 Jul 21 23:06 node.log
-rw-r–r-- 1 algorand algorand 58 Jun 25 18:14 system.json

Next I moved the directory /var/lib/algorand out of the way and created a symbolic link to the location of the new directory:
lrwxrwxrwx 1 root root 23 Jul 23 11:37 /var/lib/algorand -> /home/algorand/algorand

I started the mainnet and it did not work! A quick look at /var/log/syslog showed why:
algod[17006]: Cannot read genesis file /var/lib/algorand/genesis.json: open /var/lib/algorand/genesis.json: permission denied

I was puzzled by this as the permissions looked correct. I double-checked them again and could see nothing wrong. Using “sudo -u algorand cat /var/lib/algorand/” and “sudo -u algorand touch /var/lib/algorand/” I checked that it could view and touch the genesis.json file - and I also checked that it could create a file “foo” with the correct permission - which it can:
-rw-r–r-- 1 algorand algorand 0 Jul 23 11:53 foo

I restored the native algorand directory in /var/lib and, starting it up, algod worked as expected. I repeated the above steps to move it to the /home partition again and double-checked everything, just in case I had missed something, and it still threw the same permission problem.

I was about to conclude that perhaps there was something about algod that meant that it was having problems with symbolic links - but I thought I’d test it in a different partition to be sure.

I grabbed a spare external USB drive, formatted it and mounted it as /mnt/algorand. I repeated the same steps and created /mnt/algorand/algorand:
drwxr-xr-x 5 algorand algorand 4096 Jul 23 08:29 /mnt/algorand/algorand

I set the symbolic link to point here:
lrwxrwxrwx 1 root root 22 Jul 23 12:02 algorand -> /mnt/algorand/algorand

And tried again - this time it worked!
algod[6776]: Initializing the Algorand node… Success!

$ goal node status -d /var/lib/algorand
Last committed block: 164491
Time since last block: 0.0s
Sync Time: 118.9s
Last consensus protocol: https://github.com/algorandfoundation/specs/tree/5615adc36bad610c7f165fa2967f4ecfa75125f0
Next consensus protocol: https://github.com/algorandfoundation/specs/tree/5615adc36bad610c7f165fa2967f4ecfa75125f0
Round for next consensus protocol: 164492
Next consensus protocol supported: true
Last Catchpoint: 160000#2WKICX3WIAQIOQUNZPNMT6VX4O2346HQKF3KKUXACHEDKGSKBUNA
Genesis ID: mainnet-v1.0
Genesis hash: wGHE2Pwdvd7S12BL5FaOP20EGYesN73ktiC1qzkkit8=

I put the mainnet back onto the /home partition and got the permissions problem again.

My attention now turned to the testnet and I replicated the same steps to put it on /mnt/algorand (at this time the mainnet directory remained on /home):
drwxr-xr-x 3 algorand algorand 4096 Jul 23 08:38 /mnt/algorand/algorand_testnet

Using the following to start and stop the service I found that the testnet would not start:
sudo systemctl start algorand@(systemd-escape /var/lib/algorand_testnet)
sudo systemctl stop algorand@(systemd-escape /var/lib/algorand_testnet)

A quick check of /var/log/syslog revealed that algod was attempting to access /var/lib/algorand and failing with permission problems:
algod[7228]: Permission error on accessing telemetry config: mkdir /var/lib/algorand/.algorand: permission denied

I moved the mainnet configuration back to /mnt/algorand/algorand and the testnet started as expected.

I have a working node running mainnet and testnet using an external USB drive for the algorand storage. It is OK for now but I’d really like to understand why algod will not run when the algorand directories reside on the /home partition. I’m sure I’ve done something daft but can’t seem to spot it.

@rmb
The only thing here in your detailed scenario that I think could cause this issue is the symbolic link.
Try to ensure that the algorand user has access to it. Beside that, I can’t see why algod would care where the underlaying files are.

Thanks @tsachi.
I’ve been over this several times and the symlinks are standard, root-owned, rwx for user, group and other users. The fact that they work when pointing to /mnt/algorand/algorand[_testnet] and don’t work when pointing to /home/algorand/algorand[_testnet] seems to rule out an issue with the symlinks.

In the next day or so I will take my system down, shrink the /home partition and create a new partition. I’ll mount it as /algorand and then try the symlinks pointing to /algorand/algorand[_testnet]

Hopefully there will be a lightbulb moment soon as the permissions issues when pointing to the /home partition are really bugging me.