how unusual is it to produce an account seed phrase that has one word twice?
The probability of each mnemonic word being chosen is equal, and it is the same as randomly selecting from 2048 words. When this process is repeated 25 times, the probability of at least one duplicate word appearing is approximately 13.68%.
Me:
Please solve the following problem: We have a passphrase consisting of 25 words, and each word can be chosen from 2048 different words. What is the probability that a randomly selected passphrase will not contain any duplicate words?
ChatGPT 4o:
Calculating the Probability
To determine the probability, we need to calculate the number of ways to create a 25-word passphrase without repetition and then divide it by the total number of possible passphrases.
-
Total Possible Passphrases (Allowing Repetitions)
Since each word can be chosen from 2048 different options and the passphrase consists of 25 words, the total number of possible passphrases is:
Total possibilities=2048**25 -
Number of Passphrases Without Repetition
If there are no repeated words, the first word has 2048 choices, the second word has 2047, the third has 2046, and so on. This is a permutation, so the number of passphrases without repetition is:
Non-repeating possibilities = π(2048, 25) = 2048! / (2048β25)! -
The Probability We Are Looking For
The probability of generating a random passphrase where no word is repeated is:
π = π(2048, 25) / 2048**25
Now, letβs compute this value exactly.
import math
# Data
total_words = 2048 # Number of words in the set
phrase_length = 25 # Length of the passphrase
# Number of possibilities without repetition (permutation)
perm_no_repeats = math.prod(range(total_words, total_words - phrase_length, -1))
# Total possible passphrases (allowing repetition)
total_permutations = total_words ** phrase_length
# Compute the probability
probability = perm_no_repeats / total_permutations
# Print the result rounded to 4 decimal places
print(f"P = {probability:.4f}")
Final Result
Based on the calculation, the probability that a randomly chosen 25-word passphrase will not have any duplicate words is approximately 86.32%.
Me:
The probability of having at least one repetition is 100% - 86.32% = 13.68%
Note: the 25 passphrase words encode 256 bits of entropy, and 8 bits checksum. So the last word contains 3 random bits + 8 bits of checksum.