Generating Pokémon names using RNNs

Yan Gobeil
Towards Data Science
7 min readJul 3, 2019

--

An example of character level word generation

A while ago when I was finishing the Deep Learning specializaion on Coursera I decided to try to implement some things that I learned, As everyone who took theses courses know the theory is incredible but the practice is pretty much just a pre-made code with some blanks to fill. I think it is not a very good way to learn so I wanted to code it all from scratch. The example that attracted me the most was generating words that look like the data the model is given. The course used dinosaur names and I decided to use Pokémon names instead. I used keras with python to generate names of the likes of entor, incono and areacita. The code can be found on my github.

The model

In order to generate names on a character basis like I did I had to use recurrent neural networks (RNNs). These are networks that take in a sequence of inputs in the form of “time” steps. Each of these inputs passes through one or many layers of neurons before being converted into an output. The weights used to go from the input to the neurons, from the neurons to the output and from the neurons at one time to the neurons at the next time are the same at each time step. This allows the network to model situations where later elements in the sequence depend on the previous ones, like in time series or text.

Example of 3-to-3 RNN. Each input X is converted into neurons h by using weights W. The neurons are then converted into outputs y using a different set of weights and are passed to the next time step using another set of weights.

There are various possible combinations of input/output that can be used for RNNs with varying number of elements. For example it would be necessary to use a network with many inputs (say one per word) with only one output in order to do sentiment analysis and classify text into positive or negative. To generate names the architecture that is used is many-to-many with the same number of outputs as inputs.

The data that needs to be fed to this model is first the names decomposed into characters as an input. The output for each time step (here, time step = character) is the next character in the sequence. For example if the input is “pikachu” the corresponding output is “ikachu ”. Note the space at the end of the output that makes both input and output have the same number of characters. This is important because the neural network has a fixed size and all the data must have that size. This space also tells the model that the name has ended.

The reason why the data needs to be formatted in this particular way is because the model is used at prediction time to predict letters of the new name one after the other. I give the model an empty character (not a space!) as the first input and the first output is the probability distribution over characters that I use to generate the first letter of the name. This letter is then fed as the second input to generate a probability distribution for the second character and I continue until I obtain a space, which means the end of the name.

Collecting and cleaning the data

In order to train such a generative model I needed as much data as possible. In this case the data was simply the list of names of all the Pokémon that exist. It turns out that there is a very nice website with an API to collect a bunch of data about Pokémon. I only needed the names so it was very easy to use.

This resulted in a list of 964 names but with some problems. Many Pokémon are included in the list multiple times. Some have mega evolutions that appear as ‘name-mega’ and many other characteristics do the same thing. The simplest solution I found for this is to remove everything after a dash. However some real names have dashes so I had to take care of them before cleaning. For example I had to manually change ‘mr-mime’ to ‘mr mime’. One last problem was ‘porygon-2’. Since I didn’t want to include numbers in my allowed characters I converted this Pokémon name to ‘porygon two’.

After taking care of all these details I performed the split at dashes, keeping only the first part of the word. This lead to copies of many names appearing in the list so I had to remove duplicates. This gave a final count of 806 Pokémon names.

The last important step there was to add a period at the end of each name. I had to do this because some names include spaces in them so I couldn’t use spaces as indicator that the name is finished. The period will do this job.

So far I have only treated the names as a list of characters but for the neural network to understand these characters it was important to one-hot encode them. This means that each character is represented as a vector with the same size as the number of different characters that I consider. In this case it is a-z plus a space and a period so 28 characters. Every entry of this vector is zero except the one that corresponds to the character encoded. Each character in a word is encoded this way and they are combined into a matrix to represent a whole name. Since the RNN has a fixed size this matrix must be big enough to encode the longest name so shorter names have some instances of the null vector at the end. I chose to use the convention that each row of the matrix is a character so the matrix has size (max_char, char_dim), where max_char is the maximum number of characters in a name and char_dim is the dimension of the space of characters.

To code this I first created a dictionary to go between the characters and the one-hot encoding index. Then I defined the important quantities and finally created the training data described above.

One thing to note is that there is no splitting of the data into training and test sets. This is because this is not a usual supervised learning task where there is a fixed expected output. Because of this I was able to use all the data to train the model.

Training the model

With the data prepared the last step was to create and train the recurrent network.

I decided to go with a layer of LSTM because I find it is the most common and generic one. Long short term memory layers are composed of a few different activations that are combined in a way as to remember what happened in the previous time steps. It is able to learn by itself how far in the past to go. The size of the layer was decided by training the model and seeing which one generated the nicest results. The dense layer for the output is composed of a sequence of softmax activations that give the probabilities for the appearance of each character at each position. I used the Adam optimizer again because it is the most common one and I used the categorical cross-entropy loss because of the softmax activation for the output.

Since the goal here was not to predict an output it was harder to evaluate the progress of the learning. No metric is useful for such a problem. I decided to generate a few names at each epoch while training the model to judge how the model is doing.

This function basically implements the procedure described above to generate a name. It starts with a null input and feeds it to the model. It then takes the first output as a probability distribution to sample a character, which is then taken as the first element of the generated name. This partial name is again fed to the network and the second output is now used as a probability distribution to sample the second character. This continues until a period is found or until the word reaches the size of the network.

This function to generate words was used as a keras callback while training the model.

This generates three names every 25 epochs during training. A few examples of names obtained are

Names generated after epoch 0:
dxjaemprwpk.
zykhv.
uzvlzwbvisa.
Names generated after epoch 125:
ugerli.
yunof.
mhanurs.
Names generated after epoch 275:
urcono.
iggyy.
louk.

Clearly the original model generates junk but it quickly starts generating viable words. However the words generated after 125 epochs don’t look at all like what we expect for Pokémon. The names at the end of the training look much more like the other Pokémon names. It even happens that real names appear so the model is close to overfitting. Depending on the goal both situations can be useful.

If you read this and found it interesting/useful I thank you very much! Don’t hesitate to leave comments or ask any question you may have. I still have many more projects in mind that I want to do and write about :)

--

--

I am a data scientist at Décathlon Canada working on generating intelligence from sports images. I aim to learn as much as I can about AI and programming.