Why did the swimmer chant? To become a swim merchant!
By the end of this article you will have an AI that can generate hundreds of thousands of jokes with the click of a button.
I have always loved puns of all kinds. Some clever, some unapologetically clunky and endearing. My favourite ones are homographic puns, which uses words which are written similarly but possess different meanings or sounds. A pirate might yell "me trousers!", whereas a commuter might complain about the increasing amount of metro users. Connecting these in a clever way creates a delightful mental image of a pirate getting his pants stolen in a metro. Coming up with these is a fun past time in itself, but being a computer scientist I began to wonder if we could automate this type of humor altogether. Humor is considered uniquely a human trait which separates us from machine. What would a joke-spitting machine look like?
This article explores a short afternoon into procedural joke generation. It entails some clever array structuring to build applicable joke baselines as well as utilizing a neural network that turns these into jokes. Finally, we have a working joke machine that can churn out hundreds of thousands of jokes in seconds. The program was originally written for my native language (Finnish), but due to the prevalent NLP models I adapted the model to the lingua franca of our times, English. With very limited tuning this same principle can be used to spit out jokes in French, Spanish or Mongolian.
As you might have guessed, the first joke of the article was generated by a machine.
How many Tries does it take to generate a good joke?
one
Technically, it takes one Trie to generate all of the jokes. Our goal is rather simple: come up with a way to calculate all of the ways the words in the English corpus can be divided and re-arranged to new words. In this challenge we are looking at a very specific group of words, of which a good example might be maples and lessons. Maples and lessons can be arranged into map lessons and maples sons. For a joke, you might instantly imagine a Toronto hockey fan studying a map. To be exact, we are looking for two words satisfying the following principles:
- The words are all standard, understandable English (or Swahili, Hungarian or any other language)
- The words share a common part – let’s call it the connector. The part is at the end of the first word and at the beginning of the second (in our example, this is les)
- The words are sensical even if we remove this common part: map and sons
- For my application, I wanted to limit the connector to be a word not in English, and to be at least a few characters long. While it certainly could be, I just find the jokes to be more clever without the obvious ones like shuttles and sticks (connector: s) or workbook and bookworm (connector: book).
The naive option would be to go through all of the words in the English language and check if any of those have overlapping substrings with any of the other words. The English language has around 170 000 words (depends on who you ask and where you draw the line on compound words), and even by limiting to the more useful ones we end up with a dictionary of tens of thousands of words. I used a dictionary with 58109 words found here. The complexity of going through all of these words, matching them against each other and finding out the substrings is O(m²*n²), where n is the number of words and m the average length of the words. Can we do better than brute force? After studying a bit I decided to create an implementation of Trie, which always has a complexity of O(n). If you are not familiar with Tries, they are a subgroup of search trees where the search strings are a series of characters (words), and every character splits the tree into subtrees. Each last letter of the word ends at a leaf containing a value to be fetched. In a sense, they are compact hashtables where the hash keys function as strings. In our application we are not interested in using the structure for data fetching, but rather building a queryable structure which contains all of our word logic. The following image clarifies the system:

The blue nodes are common roots for ask and ash as well as care and cart. Additionally, as and car are valid English words. Thinking back to our list of pre-requisites, we could simply store if a node contains children and is a word in itself – this would instantly satisfy all three points. Collecting all of the words from a trie built this way is O(n) in complexity.
Clearly this only solves the first half of our problem, and we still need to come up with a way to account for common endings for words. But we’ll come back to this later. Here is the implementation for Tries, which I modified to fetch metadata instead of leaf values during construction.
class trieNode():
def __init__(self, letter = None):
# Keep track of children as nodes and letters
self.children = []
self.children_nodes = []
self.is_leaf = False
self.letter = letter
# Utility: store the "words" up to this point,
# as well as all the child strings that follow this.
# This exhanges some memory for easier lookup later
# Additionally, we store how many times a node has been visited
self.string = None
self.child_strings = []
self.visited = 0
class Trie():
def __init__(self):
self.root = trieNode()
self.n_nodes = 0
def insert(self, key):
# Start at the root
pointer = self.root
idx = 0
for i in key:
if i in pointer.children: # If the child exists
order = pointer.children.index(i) # simply move forward
pointer = pointer.children_nodes[order]
else: # Otherwise create and append a new node
pointer.children.append(i)
pointer.children_nodes.append(trieNode(i))
self.n_nodes += 1
pointer = pointer.children_nodes[-1]
pointer.string = key[0:(idx + 1)]
# Update the other values
pointer.visited += 1
# ...and if the node is a leaf, or if we should simply add new children
idx += 1
if idx == len(key):
pointer.is_leaf = True
else:
pointer.child_strings.append(key[(idx):len(key)])
This enables us to create a tree model, where each node keeps track of if it is the end of a word, if it has children and what those children are. As an added bonus creating this table is a single-pass operation, after which we can endlessly query the structure to fetch any nodes which end a word (requirement 2 & 3) and have endings of varying lengths (requirement 3 & 4). For my dictionary of 80k+ words this resulted in a trie consisting of 143 638 nodes, or just over 60k extra nodes over the number of words. This only took a couple of seconds. It is really quite astonishing how densely you can pack the dictionary into a data structure like this.
If at any point you want to follow along the code, you can check out the Colab here. I will only paste the more interesting bits of code here, so go ahead and copy your own playground for the full experience.
Here come the wordplays
Remember when I promised to get back to how we are going to come up with the word endings? Now’s the time. So far we have a method of creating the common beginnings for a number of endings. Give a second of thought to how we might achieve the opposite: get the common endings for a number of prefixes.
…
That’s right, we can utilize what we have. If we simply flip each word and create our Trie, we will end up with a structure of common endings and initial prefixes. There is a bit of work behind the scenes flipping words back and forth, but finally we can put everything together.
def create_dict(trie):
result = {}
def crawl(trie):
if len(trie.children_nodes) == 0:
return
if trie.is_leaf and len(trie.children) > 0:
for child_string in trie.child_strings:
if child_string not in result.keys():
result[child_string] = []
result[child_string].append(trie.string)
for child in trie.children_nodes:
crawl(child)
crawl(trie)
return result
def create_flipped_dict(trie):
result = {}
def crawl(trie):
if len(trie.children_nodes) == 0:
return
if trie.is_leaf and len(trie.children) > 0:
for child_string in trie.child_strings:
flipped_string = child_string[::-1]
if flipped_string not in result.keys():
result[flipped_string] = []
result[flipped_string].append(trie.string[::-1])
for child in trie.children_nodes:
crawl(child)
crawl(trie)
return result
class jokeGenerator():
def __init__(self):
self.trie = Trie()
self.flipped_trie = Trie()
self.words = None
self.result = None
self.flipped_result = None
self.common_keys = None
self.wordplays = None
self.tokenizer = None
self.model = None
def loadWords(self, source):
words = pd.read_csv(source, na_filter = False)
words = words.values.tolist()
words = [x[0] for x in words]
print(f'Loading {len(words)} words')
i = 0
n_words = len(words)
for word in words:
i += 1
if i % int(n_words/10) == 0:
print(f'{int((i+1)/n_words*100)}% ({i}/{n_words})')
self.trie.insert(word)
self.flipped_trie.insert(word[::-1])
print(f'Done')
self.words = words
# normal: all words
# not_short: the connector is longer than 2 characters
# not_a_word: all words, where the connecting word is not a word in itself
# not_a_short_word: all words, where the connecting word is not a word in itself and it is more than 2 chracters
# not_a_short_word_or_ing: all words, where the connecting word is not a word in itself and it is more than 2 chracters and is not "ing"
def generateWords(self, type = 'normal'):
if self.flipped_trie == None or self.trie == None:
print('You must load the words first: loadWords(source)')
self.flipped_result = create_flipped_dict(self.flipped_trie.root)
self.result = create_dict(self.trie.root)
common_keys = list(set(self.result.keys()).intersection(self.flipped_result.keys()))
if type == 'normal':
self.common_keys = common_keys
elif type == 'not_short':
self.common_keys = [x for x in common_keys if (len(x) > 2)]
elif type == 'not_a_word':
self.common_keys = [x for x in common_keys if (x not in self.words and x != '-')]
elif type == 'not_a_short_word':
self.common_keys = [x for x in common_keys if (x not in self.words and x != '-' and len(x) > 2)]
elif type == 'not_a_short_word_or_ing':
self.common_keys = [x for x in common_keys if (x not in self.words and x != '-' and x != 'ing' and len(x) > 2)]
self.wordplays = {}
for c_key in self.common_keys:
for r in self.result[c_key]:
for f_r in self.flipped_result[c_key]:
self.wordplays[f'{r}_{c_key}_{f_r}'] = [f'{r}', f'{c_key}',f'{f_r}']
For the beginning – end pairs we collect this way we can simply search for common matches (union) of the prefixes and postfixes, as illustrated here:

And ta-da, we have our list of homographic puns ready to go. Quite many in fact, 10 174 817 is our final score. Most of these do not fit our arbitrary fourth requirement, so let’s take it down a bit. You might notice the generateWords-method which has a number of filters. These filter the length of the connector, remove words and make sure it is not overrepresented (such as "ing", which both ends and begins in a load of English words). With our strictest criteria, we end up with around 10 000 wordplays. You can find the full list here: https://oskarniemenoja.fi/files/wordplays.txt, but here are a few of my favourites:
['absorb', 'ent', 'rapping'], or absorbent entrapping
['strut', 'ter', 'rains'], or strutter terrains
['hi', 'res', 'cued'], or hires rescued
['environ', 'ment', 'ally'], or environment mentally
['hips', 'ter', 'race'], or hipster terrace
['spa', 'res', 'ponds'], or spares responds
['for', 'mal', 'formation'], or formal malformation
['for', 'mer', 'its'], or former merits
['sit', 'com', 'posers'], or sitcom composers
['by', 'tes', 'table'], or bytes testable
We could go on and on, but these serve as a tireless muse to fuel your puns of varying quality. And best of all, there are any number up to a whopping 10 million of them.
Sounds like a job for natural language processing
Now this is all good and well, we have a machine spitting pun-ilicious word pairs out at just about any rate you can imagine. But the role of coming up with jokes is still on the human. Can we automate this further, and have a real end to end pipeline for generating humor? Of course we can. Let’s append a couple of methods to our class:
def loadModels(self, language = 'english'):
if language == 'finnish':
self.tokenizer = GPT2Tokenizer.from_pretrained('Finnish-NLP/gpt2-finnish')
self.model = GPT2LMHeadModel.from_pretrained('Finnish-NLP/gpt2-finnish')
elif language == 'english':
self.tokenizer = GPT2Tokenizer.from_pretrained('gpt2-large')
self.model = GPT2LMHeadModel.from_pretrained('gpt2-large')
# Adding new languages is laughably easy. Search for a passable model on Huggingface, chuck it here and let the script do it's magic.
def generateJoke(self, first_string = '', second_string = '', n = 1, length = 30):
joke = self.wordplays[random.choice(list(self.wordplays.keys()))]
joke_words = joke[0] + joke[1] + ' ' + joke[2]
joke_string = first_string + ' ' + joke_words + ' ' + second_string
input_ids = self.tokenizer.encode(joke_string, return_tensors = 'pt')
output = self.model.generate(input_ids,
max_length = length,
num_beams = n,
num_return_sequences = n,
no_repeat_ngram_size = 3)
result = []
for op in output:
result.append(self.tokenizer.decode(op))
return result
These load a pre-trained transformer model from Huggingface and generate a string using a prompt from our pre-generated wordplays. I have provided two languages, but feel free to test your own and write in the comments your experiences. Suffice to say, the same principle should work for any language if you just feed the Tries a dictionary in any target language and fetch a correct transformer trained in the language of your choice. In practise, however, I noticed the Finnish version being considerably worse than the English version, so your mileage may vary.
Even so, we are a step closer to true AI humor. Let’s see what this bad boy spits out.
Prompt: Here is a funny story about environment ions and environ mentions
Result: that I found on the internet. It is a story about a man who was trying to get a job in a chemical plant. He was told that the job was for a man with a PhD in chemistry. He went to the job interview and was told he was not qualified. He then went to his supervisor and said that he was qualified.
In an endearing way this is somewhat close to what early 20th century humor was like – contradicting societal norms and the expectations of the readers. It also picks up on the context of the joke being about environmental chemistry, and even comments on the joke being "found on the internet". While it’s not winning any awards, it’s understandable and could plausibly look like something somebody with a bad sense of humor wrote. And who could be angry at this cute little robot just trying it’s best:
Prompt: Tell me a joke about entrainment and mentally
Result: weirdness. I'm not sure if I'm going to be able to do this. I'm not a comedian. I don't have a lot of experience with comedy. I've never done a standup show. I have no idea what I'm doing. I just want to do it. I want to be a comedian
Nonetheless, these are not humor as we understand it. It seems the transformer is not trained sufficiently on jokes, and rather tries to write prose the best it can. I spent some time looking for a model trained specifically on jokes, but it did not yield much different results. Can we do any better? As it turns out, we can.
Que the OpenAI GPT-3
The lovely folks at OpenAI have put their GPT-3 model into open beta, and while it is not free (it costs around 0,003€ to generate a single joke, or 3€ for every 1000 jokes) it fares considerably better in the nuances of human speech. You can go sign up for your api key here. Let’s modify the code to query the API instead.
import os
import openai
words = eng_gen.wordplays[random.choice(list(eng_gen.wordplays.keys()))]
input_word = ''.join(words[0:2]) + ' ' + words[2] + ' and ' + words[0] + ' ' + ''.join(words[1:3])
openai.api_key = "api_key_here"
result = openai.Completion.create(
engine="text-davinci-002",
prompt=f"Write a joke containing the words '{input_word}'",
max_tokens=60,
temperature = 0.3,
n = 1,
echo = True
)
result.to_dict()['choices'][0]['text']
Let’s see how it plays out
Prompt: Write a joke containing the words 'potent angle and pot entangle'
Result: What do you call a potent angle that's also good at entangling? A pot-entangle!
Would you look at that. Not only does it understand the typical phrasing of a joke, it ties together nice little connotations from the words themselves. This actually got a chuckle out of me, and it is 100% procedurally generated. I could mash this button and get served an endless number of jokes. Actually, before we finish, let’s tidy up and do just that.
def create_jokes(wordplays, n, api_key):
results = pd.DataFrame({'input' : [], 'prompt' : [], 'answer' : []})
openai.api_key = api_key
for i in range(n):
words = wordplays[random.choice(list(wordplays.keys()))]
input_word = ''.join(words[0:2]) + ' ' + words[2] + ' and ' + words[0] + ' ' + ''.join(words[1:3])
prompt = f"Write a joke containing the words '{input_word}'"
result = openai.Completion.create(
engine="text-davinci-002",
prompt=prompt,
max_tokens=60,
temperature = 0.3,
n = 1
)
answer = [input_word, prompt, result.to_dict()['choices'][0]['text']]
results.loc[len(results)] = answer
return results
create_jokes(eng_gen.wordplays, 10, "an_api_key")
Let’s see what it has in store for us.
Why did the cables sons take cab lessons? So they could get a job as a taxi driver!
I was going to use a cleanser rated and clean serrated knife to make dinner, but then I realized that would be overkill.
What do you call a bagger who's also a manic German? A Bagger Manic Germanic!
What do you call a superintendent who's good at enticement? A superintend-ice-nt!
These are surprisingly not bad. Granted, I had to hit the button a few times to get a respectable set of jokes, but nonetheless these all came without any single human twisting and turning words. Feel free to try out the colab yourself and see what you come up with.
Closing words
All in all, this went much better than I initially expected. I set out to create a nice dictionary of word pairs to fuel my own imagination. With some clever data structures we now have an endless supply of those in any language we want to. Plugging these to off-the-shelf NLP models yielded surprisingly good results. Are any comedians going out of business because of this AI? Probably not. But I could easily see a niche tour consisting of a joke-writing robot and a speech synthesiser co-performing on a stage. Or books filled with jokes generated procedurally. These types of programs could help writers, comedians or screenwriters to overcome white paper syndrome, or just serve as an introduction into how robots can achieve surprisingly lifelike results.
Humor is one of the defining traits of being human. It would be hard to argue that machines feel emotions, either sadness, joy or indeed laughter. A few days ago I would have imagined they lacked the contextual awareness to create clever wordplays and twist language onto new meanings. This little experiment has been a fun little exercise, but a surprisingly deep delve into what being human is in the end. Watching a robot write understandable and at times even clever humorous prose feels like breaking a wall I did not know existed. It does feel more like co-working with a computer than simply instructing it on what to do.