Solving the New York Times Spelling Bee can be a rewarding experience that balances a challenge with the pleasure of word exploration. While it’s not always a walk in the park, the satisfaction gained from finding each word is well worth the effort. Among the various linguistic achievements in the puzzle, uncovering the pangram is like discovering a hidden treasure. This special word, which uses all the given letters, highlights the player’s skill in navigating the rich complexities of the English lexicon.
Finding the pangram is an exhilarating activity for many people, and it also serves as a compelling case for natural language processing (NLP) exercises. SpaCy (Honnibal & Montani, 2017) is my favorite tool for such tasks. It is open-sourced under the MIT license. You can write a program for SpaCy manually, but I’d like to show you how to develop such a solution using GPT-4.
Background
Spelling Bee
The New York Times Spelling Bee is a popular word puzzle game found in the New York Times newspaper and online on the New York Times website. In the game, players are given a set of seven letters, with one of the letters designated as the "center" letter. The objective of the game is to create as many words as possible using the given letters while adhering to the following rules:
- Each word must be at least four letters long.
- The "center" letter must appear in every word.
- Words must be in the English dictionary.
- Proper nouns and obscure or offensive words are not allowed.
The game assigns a point value to each word based on its length. Players receive one point for a four-letter word, and the point value increases with each additional letter. A pangram is a word that uses all seven given letters at least once, and it awards bonus points.
GPT-4
GPT, or Generative Pre-trained Transformer, is a cutting-edge AI language model developed by OpenAI that leverages deep learning techniques to comprehend and generate human-like text. With its powerful transformer architecture and pre-training on vast amounts of textual data, GPT is capable of impressive performances across a wide range of natural language processing tasks, including text completion, translation, summarization, and more.
Spacy
SpaCy is a high-performance, open-source Python library designed for advanced natural language processing (NLP) tasks. Developed by Explosion AI, SpaCy offers efficient, production-ready tools for text processing, tokenization, part-of-speech tagging, named entity recognition, dependency parsing, and more. Built with a focus on speed and ease of use, SpaCy enables developers to quickly build custom NLP applications.
Develop a Program in Spacy with GPT-4
Get Ready
Let’s get the toolchain ready to develop Spacy. You should have Python and pip installed, as SpaCy is a Python library.
To write a program in SpaCy to find pangrams in the NYT Spelling Bee, we’ll need to get a list of words from a dictionary or a corpus. For this example, I’ll use the NLTK (Natural Language Toolkit) (Bird et al., 2009) library to get a list of words. Install NLTK if you haven’t already:
pip install nltk
Then, download the ‘words’ corpus from NLTK:
import nltk
nltk.download('words')
Note: NTLK’s words corpus is a list of words used in the English language. The words corpus in NLTK is sourced from the "Words" dataset, which was originally derived from the Unix "Words" file. This file is a list of English words that were commonly used in Unix systems for various purposes, such as spell-checking and generating random text. The Unix words file is not an exhaustive list of English words, and it may contain some archaic or uncommon words. It also lacks proper nouns and may not include more recent additions to the English language. Nonetheless, it provides a useful starting point for various NLP tasks, such as word segmentation, tokenization, and spelling correction.
First Attempt
Let’s make a straightforward request to GPT-4.
Can you write a program in SpaCy to find pangrams in the New York Times’s Spelling Bee?
The answer provided by GPT-4 looks promising. It includes the steps to set up SpaCy and the following Python program.
import spacy
import nltk
from nltk.corpus import words
# Load the English language model
nlp = spacy.load("en_core_web_sm")
def is_pangram(word, required_letter, given_letters):
word = word.lower()
if required_letter not in word:
return False
return all(l in word for l in given_letters)
def find_pangrams(word_list, required_letter, given_letters):
pangrams = set()
for word in word_list:
if is_pangram(word, required_letter, given_letters):
pangrams.add(word)
return pangrams
def main():
# Get the list of words from nltk
nltk_words = words.words()
# Define the required letter and other given letters
required_letter = 'i'
given_letters = ['a', 'd', 'e', 'm', 't', 'y', 'i']
# Find pangrams
pangrams = find_pangrams(nltk_words, required_letter, given_letters)
print("Pangrams found:", pangrams)
if __name__ == "__main__":
main()
This is the output.
>>> pangrams
{'diathermotherapy', 'pentamethylenediamine', 'absentmindedly',
'undemonstratively', 'hypodermatically', 'bisdimethylamino',
'endolymphatic', 'Polystomidae', 'metasedimentary', 'Methodistically',
'thermodynamicist', 'vasoepididymostomy', 'animatedly', 'radiochemistry',
'hydrometeorological', 'demisability', 'hydrometallurgically',
'Mylodontidae', 'ektodynamorphic', 'detrimentally', 'Tetradynamia',
'immoderately', 'Merycopotamidae', 'Hypsiprymnodontinae', 'lymphadenitis',
'diaphanometry', 'Cyclostomatidae', 'Platystomidae', 'smithydander',
'hydroatmospheric', 'hydropneumatic', 'condimentary', 'hydrometamorphism', 'pentadactylism', 'diathermacy', 'ventripyramid', 'acidometry', 'thermodynamist', 'improvidentially', 'methodically', 'diethylenediamine', 'aldermanity', 'hemidactylous', 'myxadenitis', 'Myliobatidae', 'demountability', 'diathermy', 'lymphangioendothelioma', 'unmitigatedly', 'Basidiomycetes', 'Hydrometridae', 'ectodactylism', 'thermoradiotherapy', 'predaytime', 'indemnificatory', 'Hemibasidiomycetes', 'dilemmatically', 'cardiometry', 'lymphadenectasia', 'misunderstandingly', 'diastematomyelia',
Taking a closer look, we find that it includes letters that are not on the given letters list.
Second Attempt
With our observation, we can ask GPT to improve.
The program does not work properly. It gives words containing the "given_letters", but what we need is words ONLY contain the given_letters. It cannot have any other letters.
GPT-4 rewrites the program.
import nltk
from nltk.corpus import words
def is_pangram(word, required_letter, given_letters):
word = word.lower()
if required_letter not in word:
return False
for letter in word:
if letter not in given_letters:
return False
return all(l in word for l in given_letters)
def find_pangrams(word_list, required_letter, given_letters):
pangrams = set()
for word in word_list:
if is_pangram(word, required_letter, given_letters):
pangrams.add(word)
return pangrams
def main():
# Get the list of words from nltk
nltk_words = words.words()
# Define the required letter and other given letters
required_letter = 'i'
given_letters = ['a', 'd', 'e', 'm', 't', 'y', 'i']
# Find pangrams
pangrams = find_pangrams(nltk_words, required_letter, given_letters)
print("Pangrams found:", pangrams)
if __name__ == "__main__":
main()
The output is three words this time. The word that NYTimes Spelling Bee looks for is ‘daytime’.
>>> pangrams
{'didymate', 'daytime', 'Maytide'}
Can we call this a win?



An Exception and the Solution
The Spelling Bee of May 17, 2023, has the following given letters: C D E N L O W, but our program returned an empty output. It cannot find a pangram.
This is when we need to help GPT-4. I tried to ask GPT-4 to analyze the mistake, but it did not work well, so I stepped in. The algorithm looked correct, and my next inspection was the NLTK corpus. I ran a quick search in Python for the pangram of that day – clowned, and it was not in. So we found the problem.
I stated the issue and asked GPT-4 for a new solution.
In the Python program for finding pangrams, the ‘words’ in nltk.corpus did not cover all the words the NYTimes use. For instance, ‘clowned’ is not included. Is there a better place to retrieve the ‘words’?
GPT-4 acknowledged the issue and provided a new source for English words, along with the updated Python program.
A more comprehensive source for English words is the words.txt
file, which is derived from the SCOWL (Spell Checker Oriented Word Lists) project. You can find it in the GitHub repository here: https://github.com/dwyl/english-words
def read_words_from_file(filename):
with open(filename, 'r') as f:
return [word.strip() for word in f.readlines()]
def is_pangram(word, required_letter, given_letters):
word = word.lower()
if len(word) < 4 or required_letter not in word:
return False
for letter in word:
if letter not in given_letters:
return False
return all(l in word for l in given_letters)
def find_pangrams(word_list, required_letter, given_letters):
pangrams = set()
for word in word_list:
if is_pangram(word, required_letter, given_letters):
pangrams.add(word)
return pangrams
def main():
# Get the list of words from the words.txt file
filename = 'words.txt'
word_list = read_words_from_file(filename)
# Define the required letter and other given letters
required_letter = 'c'
given_letters = ['c', 'd', 'e', 'n', 'l', 'o', 'w']
# Find pangrams
pangrams = find_pangrams(word_list, required_letter, given_letters)
print("Pangrams found:", pangrams)
if __name__ == "__main__":
main()
Save the new program as pangram.py. Now the result is as follows:
% python3 pangram.py
Pangrams found: {'clowned'}
It is satisfactory!
Discussion
Putting GPT-4 to a practical task is a fun way to test the capacity of the program. It certainly is impressive and will change the way we work. Here are a couple of things that might help when giving prompts to GPT-4.
- Be specific. Define a proper scope for your questions.
- Give examples to illustrate your statement.
- Work with GPT-4, and always validate the outputs.
- Use your own mind to analyze complicated situations.
References
Honnibal, M., & Montani, I. (2017). spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc.