🟩🟩🟩🟩🟩 Optimal Wordle

John L Stechschulte
Towards Data Science
4 min readJan 20, 2022

--

Everything you’ve read about Wordle is wrong. Or, at least, everything I’ve read about choosing your first Wordle guess makes a poor assumption: that you should guess as many of the most common letters as possible. Now, it turns out that the best first guess (which I will get to) is composed of very common letters, but it ranks 24th on the list of words by average letter frequency. Why is guessing the most common letters not the best strategy? Because these letters are not very discriminative: they’re in a lot of words, so if you see yellow or green when you guess them, you still have a lot of words to sift through.

Let’s look at a simple example. Suppose you’re playing a game of two-letter Wordle with the short (and non-English) word list AB, AE, AS, AT, BE, SE, TE. Also assume that all words in the list are equally likely to be chosen as the hidden word. Clearly, the most common letters in this list are A and E, and, lo and behold, you can even put them together and guess AE to start with, which guarantees that you’ll see green!

If you start with the guess AE, you get 🟩🟩 with probability 1/7, ⬛🟩 with probability 3/7, and 🟩⬛ with probability 3/7. You have a 1/7 chance of winning, and otherwise you know the vowel. If it’s A, the remaining words in your list are AB, AS, AT. If it’s E, you’re left with BE, SE, TE. So you have a 6/7 chance of having 3 words left to guess from.

Now, let’s look at guessing BE first. Again, you have a 1/7 chance of 🟩🟩 (which is always true, regardless of your first guess). You have a 3/7 chance of ⬛🟩 (AE, SE, TE). You also have a 2/7 chance of ⬛⬛ (AS, AT), and a 1/7 chance of 🟨⬛ (AB).

If you guess AE and you’re wrong, you’re guaranteed to have three possible words left. If you guess BE and you’re wrong, you will have 3 words 50% of the time, 2 words 33% of the time, and just 1 word the other 17%. So you have a 50% chance of being better off than if you started with AE.

In this basic example, it’s easy to work through all the cases. How does this generalize to five-letter Wordle based on a list of 12,972 words (which is how many five-letter words are in Collins Scrabble Words)? The answer is entropy. In information theory, entropy is a measure of how random something is. In this situation, we want to guess the word that has the most random outcome; the guess that we are least able to predict the pattern of green, yellow, and gray tiles that will result. At one extreme, if we already know the outcome for a guess, say by guessing the same word twice, then it’s a wasted guess — even if it does contain a lot of green. More entropy means having more possible outcomes and having the probabilities of those outcomes more balanced.

Warning: here comes the math. I’ll keep it brief. The information entropy is defined as

How can we make sense of this value? First, the sum of a probability times something is an expectation, so the entropy is the expected value of the negative log probability of the outcome:

What’s the negative log probability of the outcome? In this context, the probability of the outcome is the number of words that fit that outcome, divided by the total number of words in the list. That fact, and some rules of logarithms, leads us to this:

This is a measure of how big a step we’ve taken to a solution. We start with the original word list, and we end up with a shorter list of words that fit the outcome we observe, so this difference is how much of the problem we’ve solved. The logarithm is because this process is a branching search: with each guess, we split the remaining words into several buckets, and the outcome tells us which bucket we’ll keep. If you’re splitting relatively evenly, the number of splits required to get to a single word is logarithmic in the number of words you’re starting with.

Putting it all together, the entropy is the expected amount of progress towards a solution that will result from guessing a particular word. I wrote a Python script to calculate the entropy for all possible guesses, given a word list (and assuming the answer is selected uniformly at random from the list). Running it on the Collins Scrabble Words, the top eleven first guesses are:
1. TARES
2. LARES
3. RALES
4. RATES
5. NARES
6. TALES
7. TORES
8. REAIS
9. DARES
10. ARLES
11. LORES

Like I said earlier, these words do contain some very common letters. However, the first guesses that will get you the most colored tiles, AROSE/AEROS/SOARE, rank 298th, 25th, and 20th, respectively. Also, there is only one word in the top 11 that has three vowels. The first word with four vowels is AUREI, ranked at 1044. Knowing what vowels are in the word doesn’t actually help that much.

What do you think? Did I make a mistake somewhere? Will this impact how you play Wordle? Let me know in the comments below.

--

--