Modeling Starbucks Waiting Time Using Markov Chains, with Python

Here’s how to use Markov Chains to determine your waiting time for your Starbucks coffee

Piero Paialunga
Towards Data Science

--

Photo by Jon Tyson on Unsplash

I’m from Italy and it is safe to say that for us coffee is a religion. We drink coffee to socialize, we drink coffee to wake up in the morning, we drink coffee after lunch, and we drink coffee after dinner. When you haven’t seen a friend in a while we say

“Vieni a prenderti un caffè”

Which means

“Come here and let’s drink a cup of coffee”

I live in America and the approach of American people to coffee is different. I drink to take away coffee when I go to work. I drink coffee when I work. I drink coffee while I watch a movie. American people don’t drink “espresso” but they enjoy a long time that it takes to drink the whole big cup of coffee. One more thing: there are multiple kinds of coffee!

If you get in a Starbucks you see maybe a hundred possible coffee you can get. It can be black, it can be a macchiato, it can be a latte, it can be a Frappuccino, it can be a lot of other things which I don’t even know. 😅

There are some very easy-to-make cups of coffee and there are more complex ones, that require more time to be made. So let’s say you are in line for a coffee at Starbucks. If there are 3 people in front of you and they all order black coffee, you will probably have to wait something like 3 minutes before getting your order.

Nonetheless, if they order an “extra whipped cream caramel macchiato with sprinkles and cinnamon with soy milk”… well, that can maybe double your waiting time, or at the very least you would have to wait a couple of extra minutes.

So the question is…

“How much time do I have to wait before I get my coffee and I can go writing an article about how much time I have to wait before I get my coffee?”

Of course, we have no idea what other people are going to order, so this is a probabilistic problem (or if you want a stochastic process). So how do we solve it?

A doable approach is to build a Markov chain. In particular, what we will need is a Time-Dependent Markov Chain.

Let’s build the problem!

1. Theoretical Introduction

Let’s start by giving a theoretical introduction to the problem and setting it right.

Let’s start with the simplest case. We get inside our Starbucks and we have to order our coffee. Let’s say Mathematically speaking, we can be in one of these three states.

The first state (O) is the one where we order our coffee. The second one (M) is the one where we already ordered our coffee and are waiting to get it. It’s M cause they are Making a cup of coffee. Then you get the coffee and your transit towards the Leaving state. This means that your coffee is ready and you are good to go.

Image by author

Great. Now, what are the possible transitions?

  1. You surely go from Ordering to Making. (O to M).
  2. You can go from M to M (keep waiting).
  3. Eventually, you’ll go from M to L.
Image by author

Now how do we formalize it?

1.1 About the Markov Chains

What is the assumption of the Markov Chain? The assumption is the following:

“The probability of being in the next state only depends on your current state”

For example:

The probability of being in the state Leaving in the time step t = 5 only depends on the fact that you are in the state Making in the time step t = 4.

Let’s formalize it:

Image by author

In the notation above we have that the probability of being, at time t, in the space s_t (O, M, L) is only dependent on the state that we are at time t-1.

The thing that we need to keep in mind in our specific experiment is that this probability has to be time-dependent too. This is the case, because, of course, if I have waited 5 minutes the probability of leaving in the next minute is higher than the probability of leaving if I waited for 1 minute only.

So that means that:

Image by author

That is exactly the concept we were talking about before.

Now, of course, it’s not only us in Starbucks. There are a lot of other customers too! So we will complicate this setup in a minute. But what you see above is the starting point for everything that we will do.

Let’s get this started! 🤩

2. One Customer One Drink Example

Let’s make the simplest example possible. We know what drink to get and we are the only ones in the coffee shop.
Let’s say we want a caramel macchiato. All recipes suggest that it takes 5 minutes. Let’s say that it takes 30 seconds to order and pay. So the total waiting time would be 5.30 minutes. But let’s go further. Let’s say that 5 minutes is the average starting time. Let’s also say that we could have our coffee ready in 4 minutes or 6:

Image by author

Great. Now let’s say that our time tick is 30 seconds (0.5 minutes).
After 30 seconds it is very unluckily that we would have our caramel macchiato. After 8 minutes it is very unluckily that we would still be waiting.

2.2 Hands-On Implementation

Let’s implement this. Let’s start by importing some libraries:

Let’s define our states:

Great. Now let’s run the whole process described above. We will do that using this function:

This is one example:

3. One Customer Multiple Drinks Example

Now, let’s increase the complexity of the model and make it more realistic.
Let’s say we don’t know what that specific customer wants. This is way more realistic because we have more than 150 drinks in a Starbucks and they may require a different waiting time.

Now the Markov Chain looks like this

Image by author

The difference between the previous one is that we have a probability distribution, which we will call:

Image by author

In particular, this is a probability distribution over all the possible drinks we can get, which we will call:

Image by author

For example:

Image by author

So that we can say:

Image by author

3.2 Hands On Implementation

From this Kaggle dataset (CC0: Public Domain) we can import all the Starbucks drinks:

We don’t need the other columns.
Now, we don’t know what is the most chosen beverage, let’s just make a fake distribution for our purposes:

So we extract values from 50 to 100, then we normalize this using the sum so that we have a probability vector.

So we have our probability distribution. What do we do with it?
Well, this probability distribution will make an order for us: we will just sample out of the distribution. Then we have to assume that each coffee requires a different time to be made.

Now, objectively, there is not that big of a difference. Let’s say it will make 3 minutes (as a mean value) for the “simplest” coffee (let’s say that it’s a black coffee I guess…☕️) and 6 minutes (as a mean value) for the most complex coffee to be made (let’s say the extra whipped cream caramel macchiato with sprinkles and cinnamon with soy milk 🙃).

Let’s play with different variances as well (from 0.1 to 1).

So as we see before we have the mean value that can go from 3 to 6, so you can have an average waiting time that can go to 3 to 6 minutes, and it happens that, sometimes, you have more uncertainty than other times. For example, it is possible that you have a largest variance when you are drinking coffee “A”.

Let’s implement the following situation:

Now, I believe that at this point we can make extract some meaningful information. For example…

“How much time does it usually take to grab a whatever coffee at Starbucks?”

This question is pretty generic, so we might as well answer with some statistics*:

*I changed the function a little bit removing all the “print so that my computer doesn’t explode. 🥲

4. Multiple Customers Multiple Drinks Example

Great. Now we have multiple customers. What do we do?
The situation looks like this:

Image by author

Now, it is unrealistic to think that we have to wait that one coffee is made before they start making yours. Usually you have 3,4 barista at Starbucks.

This complicates the code a little bit, because we need to keep track of the fact that all the baristas are actually busy or not. For this reason, the function that we will consider is a function of the number of customers and the number of baristas.

That’s the function that does so:

Ok, so let’s test a case when we have 4 customers and 2 baristas.

So as you can see in the first two cases we have the baristas that are free, and they wait, respectively, seven and 6.5 minutes*.

*Note! That is not a mistake! It’s just means that the second coffee is faster than the first coffee+the ordering time. It makes sense if you think about it. If you order a complex coffee and I just order a latte, I might wait less than you even if you ordered before me.

Then the baristas are busy. So we have to wait an additional 6 minutes before I can actually order. (I could make it more realistic and subtract the 1 minute of the ordering time, but you get the idea right?)

By the time the fourth customer orders, there is one free barista, so he doesn’t have to wait that long. He just waits 5.5 minutes.

Notice how complex this model can become and how interesting the situation can be. This is a case with 10 customers and 5 baristas:

5. Considerations

In this article, we discussed a general problem of waiting time in a line. There are of course multiple scenarios:

  1. We might be the only one in the store and know exactly what we want
  2. We might be the only one in the store and don’t know exactly what we want
  3. We might be a lot of people in the store and don’t know exactly what we want

These three scenarios are of course more and more complex. For this reason, the Markov Chain gets more and more complicated as we saw earlier.

  1. In the first case, the only probability distribution is the one of the “coffee-making” bit of the process. For example, it takes on average 5 minutes to do our caramel macchiato, with a gaussian distribution and variance of 1
  2. In the second case, we have the probability distribution of the coffee that we picked plus the probability distribution of the coffee-making bit, that is the one before
  3. In the third case, we have both of the above, and the number of baristas becomes another thing to consider

We could of course complicate this model a lot. We could consider the fact that machines can be broken. We can consider a shortage of baristas. We can consider the fact that the number of baristas changes with time. We can consider that some baristas are faster than others. And many more possibilities.

This starting point can be used to complicate the model in pretty much all the ways you want. Just keep in mind that we are using the tool of a Time-Dependent Markov Chain and just build from there 😎

6. Conclusions

If you liked the article and you want to know more about Machine Learning, or you just want to ask me something you can:

A. Follow me on Linkedin, where I publish all my stories
B. Subscribe to my newsletter. It will keep you updated about new stories and give you the chance to text me to receive all the corrections or doubts you may have.
C. Become a referred member, so you won’t have any “maximum number of stories for the month” and you can read whatever I (and thousands of other Machine Learning and Data Science top writers) write about the newest technology available.

--

--

PhD in Aerospace Engineering at the University of Cincinnati. Machine Learning Engineer @ Gen Nine, Martial Artist, Coffee Drinker, from Italy.