Markov models and Markov chains explained in real life: probabilistic workout routine

Markov defined a way to represent real-world stochastic systems and processes that encode dependencies and reach a steady-state over time.

Published in

Towards Data Science

13 min readDec 30, 2020

Andrei Markov didn’t agree with Pavel Nekrasov, when he said independence between variables was necessary for the Weak Law of Large Numbers to be applied.

The Weak Law of Large Numbers states something like this:

When you collect independent samples, as the number of samples gets bigger, the mean of those samples converges to the true mean of the population.

But Markov believed independence was not a necessary condition for the mean to converge. So he set out to define how the average of the outcomes from a process involving dependent random variables could converge over time.

Markov chain: a random chain of dependencies

Thanks to this intellectual disagreement, Markov created a way to describe how random, also called stochastic, systems or processes evolve over time.

The system is modeled as a sequence of states and, as time goes by, it moves in between states with a specific probability. Since the states are connected, they form a chain.

Following the academic tradition of naming discoveries and new methods after the people that developed or discovered them, this way of modeling the world is called a Markov Chain.

What’s particular about Markov chains is that, as you move along the chain, the state where you are at any given time matters. The transitions between states are conditioned, or dependent, on the state you are in before the transition occurs.

Putting all of these characteristics together, Markov was able to prove that, as long as you can reach all states in the chain, the probability of moving to a particular state will converge to a single steady value in the long run.

Markov Model or Markov Chain?

A Markov chain is simplest type of Markov model[1], where all states are observable and probabilities converge over time.

But there are other types of Markov Models. For instance, Hidden Markov Models are similar to Markov chains, but they have a few hidden states[2]. Since they’re hidden, you can’t be see them directly in the chain, only through the observation of another process that depends on it.

What you can do with Markov Models

One of the pivotal applications of Markov chains in real world problems was conducted by Claude Shannon while he was working at Bell Labs.

Claude Shannon is considered the father of Information Theory because, in his 1948 paper A Mathematical Theory of Communication[3], he created a model for how information is transmitted and received.

Shannon used Markov chains to model the English language as a sequence of letters that have a certain degree of randomness and dependencies between each other.

In the end, this Markov model was able to produce English-like text.

Claude Shannon’s groundbreaking work boosted the development of scientific areas like Information Theory but also, Natural Language Processing and Computational Linguistics, where Markov models are widely used.

Through the work of Claude Shannon and many others after him, we can conclude that Markov models:

Describe the world in a more realistic way,
Are a useful tool to make long-term predictions about a system or process.

Realistic tool to describe the world

Most real-world systems and phenomena involve multiple parts, which are rarely independent of each other. Creating a model that encodes those dependencies is a more realistic way of representing the real world.

Claude Shannon used the English language as an example, but let’s think about something else you’re familiar with, the occupancy of parking lot.

Parking lots have a fixed number of spots available, but how many of these are available at any given point in time can be described as a combination of multiple factors or variables:

Day of the week,
Time of the day,
Parking fee,
Proximity to transit,
Proximity to businesses,
Number of free parking spots in the vicinity,
Number of available spots in the parking lot itself (a full-parking lot may deter some people to park there)

Some of these factors may be independent of each others are not. For instance, Parking Fee typically depends on Time of day and Day of week.

Long-term predictions

Since Markov models describe the behavior over time, you can use them to ask different questions about the future state of the system:

How it evolves over time: In what state is the system going to be in N time steps?
Tracing possible sequences in the process: When the system goes from State A to State B in N steps, what is the likelihood that it follows a specific path p?

Going back to the parking lot example, you can ask:

What is the occupancy rate of the parking lot 3 hours from now?
How likely is the parking lot to be at 50% capacity and then at 25% capacity in 5 hours?

Workout routine as a Markov Chain

Over the past year, you’ve settled on a go-to workout routine. You’ve been optimizing it for the areas you want to improve and your personal style. So, in a typical workout day, you end up doing several sets of these exercises:

Run (3 miles)
Push-ups (20)
Rowing (15 minutes)
Pull-ups (20)

These are your recurring exercises, but you’ve noticed you don’t always do them in a specific order. Even though it’s always different, there’s a pattern. You pick one exercise at random and go from there but, the next set is not completely random, it usually depending on the exercise you did before.

You want to understand more about your optimal workout routine and even plan the next workout based on how you are normally structure it. So you realize that your workout routine can be modeled as a Markov chain.

Since you pick the next exercise set based on the set you’ve done before, your workout routine follows the Markov assumption. It assumes the transition probability between each state only depends on the state you are in.

A Markov chain has short-term memory, it only remembers where you are now and where you want to go next.

This means the path you took to reach a particular state doesn’t impact the likelihood of moving to another state. The only thing that can influence the likelihood of going from one state to the other is the state you are currently in.

Knowing all of this, you can ask interesting questions about your workout:

If I start the workout with a run, how likely am I to do push-ups on the second set?
In a 3-set workout, how likely am I to do: 1) run, 2) push-ups and 3) pull-ups?
What’s the likelihood of doing any of the exercises on the fourth set?

To find a way to answer those questions, first you need go through your workout logs. Thankfully you keep a good record of your workout sets and looking at the last 62 days, you had the following routine:

Exercise logs from the last 62 workout sessions.

Digging through the logs, you see that in the past 62 days, 8 out of 20 times you did a 15 minute rowing set, the next set was a 3 mile run.

Considering each exercise set as a state in your workout Markov Chain, the next thing you do is to encode the dependencies between states, using conditional probabilities.

In the context of Markov models, these conditional probabilities are called transition probabilities.

Transition probabilities describe the transition between states in the chain as a conditional probability.

After crunching the numbers your transition probabilities are:

Transition probabilities for each sequence of exercise sets.

So, the likelihood of doing 20 push-ups given that you’ve just finished a 3 mile run is 40%. That’s because P(20 push-ups | Run) = 4/10 = 0.4.

Put in mathematical notation, these probabilities can be represented as a transition matrix[4].

Transition matrix representing the transition probabilities in the workout routine chain.

And to better visualize the transitions between states, you can represent the workout Markov Chain as a directed graph.

Your workout routine modeled as a Markov chain.

Now that you’ve crunched the data and created a Markov model for your workout routine, you have all the building blocks to start answering the questions you posed earlier.

If I start the workout with a run, how likely am I to do push-ups on the second set?

To understand how the workout routine evolves over time there are three components to take into account:

Starting state,
End state,
Time-frame, i.e., how long it will take the model to get from the start to the end state.

In this case the start state is running 3 miles, because that’s how you chose to begin the workout, and the end state is doing a 20 push-up set. Since you’re making predictions for the second set of exercises, the time-frame is two.

Next you’ll calculate the total probability. You’ll combine all the possible ways, or paths in the Markov chain, where you start the workout with a run and in two time steps do push-ups.

Given this criteria, you have the following paths:

Path 1: Go for run given that you’ve just completed a run, then do a set of push-ups.

Path 2: Do 20 push-ups given that you’ve just completed a run, then do another set of push-ups.

Path 3: Row for 15 minutes given that you’ve just completed a run, then do 20 push-ups.

Path 4: Do 20 pull-ups given that you’ve just completed a run, then do 20 push-ups.

Putting it all these paths together, the likelihood of doing a two-set workout where you start with a run and end with push-ups is 25%.

Adopting a matrix notation you can extend the probabilities of the two-set workout to all other combinations of start and end states.

Since this corresponds to what the chain will look like in two time-steps, this matrix called the second power of the transition matrix.

Second power of the transition matrix, i.e., the state of the Markov Chain at time-step 2.

Future states are calculated using recursion

Future states are conditioned by where you are before each transition. So, to calculate a future state, you take the previous power of the transition matrix and multiply it with the transition matrix of the model.

In this case, for two time-steps, you multiply the transition matrix by itself. But, if you were to see where the chain will be at step 3, you would take the matrix P2 above and multiply it by the transition matrix.

This sequence of multiplications using previously computed matrices is called a recursion. In a generalized form, it can be defined as:

Markov chain recursion for n time steps.

That’s why matrix that results from each recursion is called the power of the transition matrix.

Steady-state probabilities

A characteristic of what is called a regular Markov chain is that, over a large enough number of iterations, all transition probabilities will converge to a value and remain unchanged[5]. This means that, after a sufficient number of iterations, the likelihood of ending up in any given state of the chain is the same, regardless of where you start.

So, when the chain reaches this point, we can say the transition probabilities reached a steady-state.

This is what Markov ultimately wanted to prove!

The time-step when the chain reaches a steady-state depends on the transitions that are possible between states and their probabilities.

But we can write a Python method that takes the workout Markov chain and run through it until reaches specific time-step or the steady state.

import numpy as npdef run_markov_chain(transition_matrix, n=10, print_transitions=False):
    """
    Takes the transition matrix and runs through each state of the Markov chain for n time steps. When the chain reaches a steady state, returns the transition probabilities and the time step of the convergence.    @params:
    - transition matrix: transition probabilities
    - n: number of time steps to run. default is 10 steps
    - print_transitions: tells if we want to print the transition matrix at each time step
"""    step = transition_matrix    for time_step in range(1, n):
        if print_transitions:
            print('Transition Matrix at step:' + str(time_step))
            print(step)
            print('-------------------------')        next_step = np.matmul(step, transition_matrix).round(2)
        
        if np.array_equal(step, next_step):
            print('Markov chain reached steady-state at time-step = ' + str(time_step))            if not print_transitions:
                print(step)            return step
        else:
            step = next_step
        
        return step

You just need to plug in the workout transition matrix.

transition_matrix = np.array([[0.1, 0.4, 0.3, 0.2],
                              [0.35, 0.1, 0.25, 0.3],
                              [0.4, 0.3, 0.05, 0.25],
                              [0.42, 0.42, 0.08, 0.08]])power_transition_matrix = run_markov_chain(transition_matrix)

And check that the workout Markov chain reaches a steady-state on the 5th time step.

Workout Markov chain reached a steady-state on time step 5.

If you’re curious to see the transition matrix at each time step, you just need to set print_transitions to True.

run_markov_chain(transition_matrix, print_transitions=True)

Workout Markov chain transition matrix at each time-step before reaching a steady-state.

What if the chain doesn’t reach a steady-state?

Only regular Markov chains converge over time. And if your Markov Chain does not converge, it has a periodic pattern.

In Markov chains that have periodicity, instead of settling on a steady-state value for the likelihood of ending in a given state, you’ll get the same transition probabilities from time to time.

But you can test if your Markov chain will eventually converge.

A Markov chain is considered regular if some power of the transition matrix has only positive, non-zero, values[5].

For any n x n transition matrix, you can to check if all powers of the transition matrix have all positive, non-zero, values, up to power:

You’ve just seen that the workout Markov chain reaches the steady-state at time-step 5. But if you wanted to test if the chain was regular, you would have to check all powers of the transition matrix up to power 10.

In 3-set workout, how likely am I to do: 1) run, 2) push-ups and 3) pull-ups?

The likelihood of doing a workout that starts with a run, then 20 push-ups and ends with 20 pull-ups is the result of following a specific path in the Markov chain.

Since each step in chain corresponds to a conditional probability, the likelihood of following a specific path is the sum of all conditional probabilities that make up that path.

In this case, the likelihood of doing Run → Push-ups → Pull-ups is 12%.

What’s the likelihood of doing any of the exercises on the fourth set?

In the first question you found a way to predict which exercise you’d be doing at specific time in the future, using powers of the transition matrix. The important detail there was that you knew it started with a Run.

But, what if you could start the workout with any exercise?

If you’re equally like to pick one of the exercises to start the workout, you can say the starting state of the workout Markov chain follows a discrete uniform distribution.

Given that you have 4 possible exercises and the starting state distribution is uniform, the likelihood of starting the workout can be described with a probability vector, v_start =[0.25, 0.25, 0.25, 0.25]

Then, to know likelihood of doing any of the exercises by the fourth set, you need compute the fourth entry of the starting probability vector.

You simply need to multiply the starting probability vector by the fourth power of the transition matrix.

def state_probability_at_power(start_probability_vec, power, transition_matrix):
    """
    @params
    - start_probability_vec: vector with the probability distribution for each starting state
    - power: number of time steps to run the markov chain
    - transition matrix: transition probabilities
"""    probability_at_power = np.matmul(start_probability_vec, run_markov_chain(transition_matrix, n=power)).round(2)
    
    print('Probability of being in each state at time-step = ' + str(power))    print(probability_at_power)
start_probability_vec = np.array([0.25, 0.25, 0.25, 0.25])
state_probability_at_power(start_probability_vec, 4, transition_matrix)

By the fourth set, you’re more likely to do 20 push-ups or go for a run.

You’ve just modeled your workout routine with a Markov chain and made predictions about future workouts!

Hope you enjoyed learning about Markov chains and Markov models.

Thanks for reading!

References

[1] Sonnenberg FA, Beck JR. Markov models in medical decision making: a practical guide. Med Decis Making. 1993 Oct-Dec;13(4):322–38.
[2] Eddy, S. What is a hidden Markov model?. Nat Biotechnol 22, 1315–1316 (2004).
[3] C. E. Shannon, “A mathematical theory of communication,” in The Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, July 1948.
[4] Grinstead, Charles M., and J. Laurie Snell. 1997. Introduction to Probability. American Mathematical Society.
[5] Sekhon, Rupinder, and Bloom, Roberta. 2020. Applied Finite Mathematics. LibreTexts.

Images by author except where stated otherwise.