How to Create a Codenames Bot Part 1: Word2vec

Published in

Towards Data Science

6 min readDec 23, 2017

(https://psychedforgames.com/home/2017/7/9/codenames)

As a board game enthusiast and programmer, it occurred to me that designing an algorithm to play the popular game Codenames would be an interesting, if not worthy endeavor. In this series of blog posts I will share my various attempts at generating the word association-based clues that are integral to Codenames. So! Let’s get started.

Codenames Explained

Players are split into two teams, red and blue. A board consisting of 25 randomly selected word tiles arranged in a five-by-five grid, like so:

An example board (http://spiralingcadaver.blogspot.com/2017/09/everybody-knows-that-bird-is-word.html)

Each team has one player as the spymaster and only the spymasters know which cards belong to their teams. Each spymaster is tasked with giving clues to her team to make them guess their cards while avoiding the other team’s cards.

For the above board, the red spymaster might give a clue such as “dish 2.” “Dish” is the associated word and “2” is the number of cards associated with it, in this case “plate” and “washer.”

“Weight 3” might be a bad clue because, while it could refer to “press,” “force” or “plate” it could also refer to blue’s card “scale.” If a team accidentally guesses their oppontent’s card, their turn is over.

Spymasters take turn giving clues until one team has guessed all of their words.

A few other important rules:

“Theater” highlighted in black is the assassin, which if guessed, means the team that guessed it looses immediately. So a spymaster must keep her team away from that card at all costs.
Clues must be one word or a proper noun such as a name.
As cards are guessed, they are covered with an associated card — blue, red, black (assassin) or white (neutral).
A clue can’t contain or be a form of any word visible on the board. For example “wavy” and “bellhop” are invalid clues. They can be used once the card has been covered by being guessed.

If you have 45 minutes to kill and want a better sense of the game, TableTop made an episode about it.

The task of creating a spymaster bot essentially boils down to automating word associations. Given word X, what are the most similar words? Then, given a collection of words X, Y and Z, what are the words most similar to all of them? All the while avoiding words similar to my opponent’s words A, B and C. And not part of any other word visible on the board. And, certainly not related in any way to the assassin.

But, to start with, the essence of the problem is finding word associations. There are a number of methods to do that which we’ll look at over the course of this series. The first of which is word2vec.

Word2vec: exactly what it sounds like

Word2vec is a way of representing a word, such as “cat” as a vector of numbers (0, 5, 1, 3, 7, 1, …). I won’t go into detail about how these vectors are created, but this article does a good job of explaining it.

Once created, the vectors have a few interesting and useful properties. Most importantly to us, related word vectors tend to be located close together in the vector space. For example, the vector for “cat” might be close to words such as “kitten,” “pet” and “dog.” This is exactly what we’re looking for. By finding the closest vectors, we’re finding the most related words.

Methodology

(The following code and additional information can all be found in a Jupyter notebook which can be easily copied and run.)

I used python3, gensim and a word2vec model prebuilt on a Google News corpus. First, we load the model, limiting it to the 500,000 most common words to filter out some of the nonsense words.

import gensim
model = gensim.models.KeyedVectors.load_word2vec_format(
    'GoogleNews-vectors-negative300.bin', binary=True, limit=500000
)

Here’s the example Codenames board we’ll be working with. blue is one team's words, red the other and assassin is the assassin word.

board = {
    'blue': [
        'ambulance', 'hospital', 'spell', 'lock', 
        'charge', 'tail', 'link', 'cook', 'web'
    ],
    'red': [
        'cat', 'button', 'pipe', 'pants', 
        'mount', 'sleep', 'stick', 'file', 'worm'
    ],
    'assassin': 'doctor'
}

First, given one word, what are the most similar words? To get the 10 most similar words to “ambulance”:

model.similar_by_word('ambulance', topn=10)

which yields:

[('paramedics', 0.7590752243995667),
 ('ambulances', 0.7493595480918884),
 ('Ambulance', 0.7236292362213135),
 ('paramedic', 0.662133514881134),
 ('Ambulance_paramedics', 0.6315338611602783),
 ('Ambulances', 0.6211477518081665),
 ('LifeFlight_helicopter', 0.6147335171699524),
 ('hospital', 0.6099206209182739),
 ('Paramedics', 0.6081751585006714),
 ('Ambulance_Service', 0.6080097556114197)]

Each line is the word, followed by how similar the word is to “ambulance.” Some of these words word be useful, “paramedics” for instance, but many are just other forms of the word “ambulance.”

gensim allows us to directly find words the most similar to a whole group of words at one time.

model.most_similar(
    positive=board['blue'],
    restrict_vocab=50000,
    topn=20
)

yields

[('For_Restrictions', 0.43488097190856934),
 ('bed', 0.39588358998298645),
 ('links', 0.38411831855773926),
 ('hook', 0.38367366790771484),
 ('paramedics', 0.38072746992111206),
 ('emergency', 0.37950167059898376),
 ('jail', 0.3759669065475464),
 ('log', 0.37062549591064453),
 ('intensive_care', 0.3661930561065674),
 ('call', 0.36543411016464233),
 ('webpage', 0.3649423122406006),
 ('tow_truck', 0.3592333197593689),
 ('click', 0.35906946659088135),
 ('cooked', 0.3552851676940918),
 ('care', 0.3537469208240509),
 ('handcuff', 0.35027384757995605),
 ('then', 0.34921103715896606),
 ('stay', 0.3478427529335022),
 ('turn', 0.34607696533203125),
 ('bookmark', 0.3458564579486847)]

This looks much better, and produces some decent clues.

“bed”, “paramedics”, “emergency” all relate to “ambulance” and “hospital.”
“jail” could relate to “lock” and “charge.”
“click” to “web” and “link.”

But “bed” would also relate to the other team’s word “sleep”; and “click” with “button.” It would be bad to give clues which could point to the opponent’s cards.

gensim allows for negative examples to be included as well to help avoid that.

model.most_similar(
    positive=board['blue'],
    negative=board['red'],
    restrict_vocab=50000
)

produces

[('Hospital', 0.27265793085098267),
 ('ambulances', 0.2605472207069397),
 ('hospitals', 0.24624229967594147),
 ('outpatient', 0.24339225888252258),
 ('inpatient', 0.2404019981622696),
 ('paramedics', 0.23482689261436462),
 ('escort', 0.23161748051643372),
 ('Partnerships', 0.23104971647262573),
 ('Medical_Center', 0.2306305170059204),
 ('telemedicine', 0.22638411819934845)]

I really like the clue “telemedicine.” It’s not obvious, but relates to four words: “web,” “link,” “ambulance” and “hospital.” This shows the potential for this method to produce novel clues.

Let’s say that the clue were “telemedicine” and the four words were removed from the board, then the next team got a turn. What might their clues be?

board = {
    'blue': ['spell', 'lock', 'charge', 'tail', 'link'],
    'red': [
        'cat', 'button', 'pipe', 'pants', 
        'mount', 'sleep', 'stick', 'file', 'worm'
    ],
    'assassin': 'doctor'
}model.most_similar(
    positive=board['red'],
    negative=board['blue'],
    restrict_vocab=50000
)

This returns:

[('pillow', 0.43686941266059875),
 ('bra', 0.3842337727546692),
 ('couch', 0.38342970609664917),
 ('tub', 0.37922778725624084),
 ('closet', 0.36959999799728394),
 ('sofa', 0.36713898181915283),
 ('bathroom', 0.366258829832077),
 ('bed', 0.36348700523376465),
 ('crotch', 0.36245280504226685),
 ('spoon', 0.36179912090301514)]

This appears much less successful. The top words mostly just seem to relate to a singe word and even then they’re stretches:

“pillow” relates to “sleep”
“bra” possibly to “pants” because they’re both clothes?
“couch?” Maybe to “sleep” or “cat?”

Key Takeaways

There is definitely potential here. Given that this was all done in just a few lines of code, it’s fairly impressive how well it performs. But there are currently a number of issues with this method as is:

The goal as a Codenames spymaster is to have your team be the first to guess all its clues. Speed is important. That means that good clues must, whenever possible, relate to as many of your words as possible. We see that in the first set of clues (paramedics, telemedicine) but not with the second (pillow, bra). This happens because one word could be much more related to a single word than any word is to a group of words. But, in Codenames, it is often best to pick a word that is related enough to your words to make it clear which words you’re referring to without actually picking a word that is the most similar. In this naive attempt, there is no preference for maximizing the number of words.
The rules need to be enforced — “ambulances” can’t be given as a clue for “ambulance” and multi-word phrases must be removed (unless it’s a proper noun).
It’s not clear how many words are “related” to the given word. The clue must be composed of a word and a number such as “telemedicine 4.” The final algorithm must have a way to indicate this number as well.

Next time we’ll start trying to resolve the issue of invalid clues by building a master dictionary of allowed words.

Thanks to Jonathan Grundy, Robert Hundley, David Leonard and Abigail Pope-Brooks, for editing and feedback.

All the code and data used is available on github. Jeremy Neiman can be found doing equally pointless things on his website: http://jeremyneiman.com/

How to Create a Codenames Bot Part 1: Word2vec

Codenames Explained

Word2vec: exactly what it sounds like

Methodology

Key Takeaways

Written by Jeremy Neiman