Markov Chains are probabilistic models that generate values by randomly moving from one state to another. The next state is solely determined by the current state, and the probability of transitioning to the next state. Their simplicity makes them a great introduction to predictive models without the need to know any specialized libraries like PyTorch and TensorFlow.
How do Markov Chains Work?
On an abstract level, a Markov chain is a collection of a finite number of states and transitions that describe random changes from one state to another. They could be used to model events where changes occur randomly, but still following some Probability distribution.
When a Markov chain records the transition from one state to another during training, it adjusts the weights of the transition to make that transition more likely to occur during predictions. If the model is trained on data where a transition A→B is twice as likely as A→C, then when the model is in state A, the next state could be B or C, but it will be twice as likely to be B.
A Simple Example
Let’s say we train the Markov chain with the words "BAT" and "BIT". We will use a period (.) to denote the end of the word. This is what the chain will look like:

When the model is used to make a prediction, it performs a random walk.
In a random walk, we start at the ‘start’ state. There is a certain probability of changing state to ‘B’, so it does that. At ‘B’, the next state is equally likely to be ‘A’ or ‘I’. The change is random (the "random" part of random walk) but still follows the probability distribution. Let’s say it went to ‘I’. It then transitions to ‘T’ (1.0 probability), and lastly to the end state (the ‘.’ state). The model assembled the word "BIT", but it was just as likely to assemble "BAT".
This particular Markov chain can’t transition to previous states; it could only generate words it has seen before.
A More Complex Example
Markov chains don’t need to follow a linear flow. Let’s take the chain from the previous example, and train it on the word "BIITAT":

There is now a probability that the model will stay in state ‘I’, and transition from state ‘T’ back to state ‘A’. The model can now generate words it has never seen before, like "BATATAT" and "BIIIIIIIT"! These words are less likely to happen, but still possible. Imagine what it could do if we feed it more data!
As long as there is a period (the end state) at the end of each word we train it with, the random walk will eventually reach the end state.
Generating Fantasy Names
A fun application of Markov chains is generating fantasy names. Given enough data, they will follow the general trends of the training data, constantly shuffling the patterns that make it up. After all, the generated words are just made from patterns of previously seen ones.
It seems to me that brand names of pharmaceuticals always have a zang to them. They have no dictionary meaning, but typically have recognizable patterns that make them sound like ‘drug names’. We will use a Markov chain to generate these names
The Data
The FDA keeps the National Drug Code (NDC) Directory of all approved medications. After separating the ‘proprietary name’ from the spreadsheet and filtering out names that have special characters, numbers, and spaces, I was left with a list of 3403 names that could then be used for the training.
The link to the GitHub repo that contains the filtered data and raw data is included at the end of the article.
Implementing the Markov Chains
In the previous examples, I have drawn out the Markov chains as graphs with nodes and links. Nodes and links make it easy to follow and visualize, but are impractical to implement. We will be implementing the Markov chain as a nested Python dictionary. The key of the outer dictionary is the ‘current state’; the value of the outer dictionary is another dictionary representing the transitions, where the key is the ‘next state’ and the value is the frequency that transition has been observed.
This is approach is harder to visualize, it’s significantly easier to access a state by simply doing chain['state']
instead of traversing a graph. Here is an example what the entry for state "T" would look like for Figure 1.2:
chain['T'] = {
'A' : 1, # T -> A transition in BIITAT
'.': 3 # T -> '.' transition in BIT, BAT, BIITAT
}
It also makes sense to store the frequencies of the transitions – they only need to be incremented during training, while all probabilities need to be recalculated.
Building the Chain
This how we would build the chain in Python:
Random Walk
Despite having seen thousands of names during training, a plain random walk of the chain as described in the examples, generates garbage. Most of the names generated have no zang (for example: just "ae"); some rules need to be in place to make the names have some funk.
While I said earlier that the next state of the Markov chain depends solely on the current state, for this application it makes more sense to account for the previous state to generate better names by checking the rules.
I am also counting the end character (‘.’) and the empty character (”) as vowels; this prevents starting or ending the words with two vowels.
The the function pick_char()
that makes a random selection of the next state works like this:
- The frequencies of all next states are summed up.
- A random value between 0 and 1 is generated and multiplied by the frequency sum. This value is between 0 and the frequency sum.
- At each ‘next state’ of the frequency dictionary, the frequency of that state is subtracted from the random value. If the random value is less than zero, then this ‘next state’ is the selected ‘next state’, otherwise we go to the next ‘next state’ and do the same. This picks a state randomly, based on the probability of it appearing.
The function next_char()
is the one that makes the random walk. It is recursive. It calls pick_char()
until it returns a character that satisfies the rules. It then recurses, with the newly generated ‘next state’ as the new ‘current state’, and the previously ‘current’ state as the previous state.
Putting It All Together…
Now we can call the previously discussed functions to carry out a random walk of the Markov chain and generate some names!
And this is what we get when we run it:
flurimbin
kenzaidel
uvinaze
colmix
iluprarax
zelare
irevene
enxophir
trexin
hysesox
Conclusion
Building a Markov chain to generate fantasy names is a fun introduction to stochastic processes. Beyond fantasy names, Markov chains and Markov processes more broadly could be used to make ‘predictions’ and generate values where the changes are a result of randomness. I hope you give this easy project a try and possibly expand it for other uses.
Here is the GitHub repo containing the code and data for this project (the generator is also implemented in JavaScript!). The data may be a bit outdated.