What's the rationale behind the bespoke 25-word mnemonic standard?

Say I have a 256-bit entropy and a corresponding BIP39 mnemonic.

Example:
entropy (in hex): 213e0a0ed186eff274503c4bb28d901430c71785df5cd37a7b4b150e0ad0e3c1
BIP39 mnemonic: cancel utility lonely perfect humble weird spend always entry nerve goat chronic arrest mesh blast twist square stable spot claw thing half monitor demand

BIP39 mnemonic generation is pretty straight-forward. Read in the bits in 11-bit chunks, left-to-right, and look up the corresponding number in the BIP39 word list. For the 24th word, take the left over 3 bits from the entropy, append 8 bits of checksum to create an 11-bit word.

Now say I want to convert a BIP39 mnemonic to the 25-word Algo standard (ALGO25 hereafter, let me know if there’s a standard name for it). I’d just convert the BIP39 back to raw entropy, and then generate the ALGO25 mnemonic from that. Just reading the docs at Overview - Algorand Developer Portal, I would have expected that the first 23 words between the two standards be identical, with only the 24th and 25th words being different. For the ALGO25 standard, the 24th word does not contain a checksum and would only be derived from the last 3 bits in the entropy, with a whole new checksum being generated to get the 25th word.

However when I generated a 25-ALGO mnemonic using the same entropy above, I got a completely different mnemonic: service cigar mandate home tenant device lonely detail enact cycle embark access mobile echo wire fresh foil unknown pizza throw beef village audit absorb abuse

This is using the official Algo SDK for Go.

My gut feeling was that it had to do with endianness. Ignoring checksum for now, the mnemonic.FromKey(entropy) conversion goes like this: entropy (256 bits) → conversion to 11-bit words (mnemonic.toUint11Array) → look up 11-bit word from word list.

Here is the raw binary before and after toUint11Array conversion,

original entropy (11-bit delimited): 00100001001 11110000010 10000011101 10100011000 01101110111 11111001001 11010001010 00000111100 01001011101 10010100011 01100100000 00101000011 00001100011 10001011110 00010111011 11101011100 11010011011 11010011110 11010010110 00101010000 11100000101 01101000011 10001111000 001

toUint11Array: 11000100001 00101000111 10000111000 01101101000 11011111000 00111100101 10000011101 00111100010 01001001011 00110110110 01001000010 00000001010 10001110011 01000101111 11111100001 01011100110 01011010011 11101101111 10100101101 11100001010 00010100000 11110100000 00001111000 110 

Notice how the output from toUint11Array looks completely different from the original entropy simply divided into 11-bit chunks. Bizarre… The logic of toUint11Array goes something like this:

To me the implementation of mnemonic.toUint11Array is quite unintuitive. It looks like a hot mess of mixed endianness, neither little or big. Or is this simply how little-endian works? I haven’t really dealt with binary encodings / endianness issues before so if I’m misunderstanding something here please do educate me.

In any case, it doesn’t feel like the ALGO25 mnemonic generation was intended to be like this. Rather it became what it is as a side effect of how toUint11Array is implemented. Why the hassle of using this bespoke standard anyway? I’d say it would have been at least acceptable if the standard was formally documented somewhere, especially the bit around endianness, but I can’t seem to find it anywhere (the most detail I got is from Overview - Algorand Developer Portal). Contrast this to the BIP39 standard, for which I can read the documentation and do the mnemonic conversion by hand pretty easily if I wanted to.

Interesting writing. BIP-39 is widely implemented - so it would have additional advantage of having more support.

1 Like