When Should You Stop Searching?

An introduction to optimal stopping and how it relates to data science

Published in

Towards Data Science

10 min readMar 31, 2023

In our everyday lives, we are often faced with the challenge of making decisions that have significant consequences. Whether it’s choosing a career, searching for a parking spot, buying a house, or even deciding who we want to spent the rest of our lives with, we are constantly evaluating and weighing different options. One strategy that mathematicians use to approach decision-making is called “optimal stopping”.

In this post, you will learn what optimal stopping is and how to use it in different real life situations. And it might be surprising, but you can also use it as a data scientist.

Let’s consider a real-life example for understanding optimal stopping. Imagine you are interviewing candidates for a job. You have a limited amount of time to interview each candidate, and once you have interviewed a candidate, you must decide whether to hire or move on to the next candidate. After a decision is made, you cannot reconsider it. The goal is to hire the best candidate possible. At a certain point in time, you are not able to know if a candidate is the best, because you haven’t seen all candidates yet! There are two ways to fail: you might stop too early, or you might stop too late.

Candidates over time. After every candidate you decide hire or no hire. Image by author.

In the image above, you can replace the persons by houses, job offers, dogs… Actually with anything you want to make a decision about and where the choices come in sequence. As data scientist, you can encounter optimal stopping problems in different fields, like in healthcare, marketing or finance.

When to stop?

So how do you make the best decision? This is where optimal stopping comes into play. Optimal stopping is a mathematical concept that involves finding the best time to make a decision based on limited information. The limited information is equal to all the experiences you had until then.

Let’s go back to the interview example from the introduction. An often used strategy for this problem is Look-then-Leap: for a fixed number of candidates you look. You don’t hire, but you gather information. After you finish with this fixed number of interviews, you hire the first candidate that is better than everyone you have seen so far. But how much time should you spend searching (looking)? We can easily test the success rate for different strategies by simulating the problem. Let’s start with 3 candidates to keep it simple.

In the image below, we directly hire (so there is no looking period). We only interview the starting candidate and hire directly. We have 33.3% chance to hire the best candidate (number 1), but also 33.3% chance to hire one of the other candidates (number 2 or 3):

Scenario 1: don’t look, directly leap. Image by author.

Our second option is to don’t hire the first candidate we interview (look), and then hire the second candidate if this candidate is better than the first. If the second candidate isn’t better, we automatically end up with the last candidate. How successful is this strategy? This is what happens in this scenario (orange dots are hired):

Scenario 2: don’t hire the first candidate, then hire the next if that one is better, else hire the last. Image by author.

Now we hire the best candidate in 50% of the possible sequences! And we only hire the worst candidate in 1 out of 6 cases, instead of 1 out of 3.

The final scenario is that we don’t hire the first two candidates, this means that we always end up with the last candidate. That is as good as scenario one, because in this case we end up with the best candidate in 1 out of 3 cases. But wait… We spend more time interviewing during this scenario! That’s true, and that’s why most people would prefer scenario 1 compared to scenario 3. For three candidates, our optimal stopping point for looking is after 1 candidate, or 33.3% of all candidates (scenario 2).

What happens if we increase the pool of candidates? For a given number of candidates we can calculate the optimal stopping percentage with simulation. In the graph below, this is visualized:

Percentage of candidates to look before you leap. Simulation of 100000 trials per candidates and looking combination. Image by author.

To be precise: the probability of selecting the best candidate in the problem converges toward 1/e, approximately 0.368.

We can also calculate the percentage of cases in which we hire the best candidate if we stick to the looking percentages of the plot above:

Percentage of cases in which you hire the best candidate if you stick to the best looking percentage. Again based on simulation. Image by author.

37 seems to be the magic number here! Good to know: you can also apply this to time. If you have a time limit, spend 37% of the total time looking and then settle for the next best candidate.

There are some downsides to this approach: if the best person was already interviewed during the looking phase, you automatically end up with the last person you will interview. And that might be the worst candidate! Besides that, you spend hours or days to interview everyone, especially if the number of candidates is high. We should include the cost of searching into our model.

How does searching work in real life?

As with many mathematical models, it can be hard to directly apply the 37% rule to real life. In real life, rules aren’t clear and there is this cost of searching.

We assume in the interview example that we don’t know anything about the candidates before interviewing them (no information). But that’s quite unrealistic in real life: you have resumes, a LinkedIn page, maybe a case study or questionnaire, so you can already start comparing candidates before the interview starts. Another assumption we had in the first example was that a candidate always accepts the offer. And that we can’t return to an earlier candidate. In a real interview setting you can interview multiple candidates and decide which one is best after you interviewed some of them.

For these cases, you can add knowledge to the model and see how this affects your decision. Let’s take a look at them one by one.

No information vs. complete information

In game theory, there is the concept of incomplete information and complete information. Imagine you have knowledge about the abilities of the candidates, and you can place them in a percentile. In this case, you can use a threshold for the percentile value to determine if you should hire the candidate, depending on when they are interviewed:

Hiring threshold for the nth candidate if there are 10 candidates in total. Image by author.

During all the interviews, you become less and less critical, because the chance that there will be a better candidate after the current one becomes lower when you are getting closer to the end of the candidate list.

Rejection and returning to previous candidates

In real life, candidates might turn down the offer. Especially candidates that get more offers (those are often the better ones). If there is a high chance that a candidate will turn down the offer, we should change our approach.

Another option we have in real life is to go back to a candidate we already interviewed. In that case, we can interview more candidates and return to someone we rejected earlier to make an offer.

If returning is allowed but there is 50% chance of rejection when making a second offer, the looking phase should be 61% instead of 37%. The probability of hiring the best candidate in this case also increases to 61%.

When rejection is 50% at the second offer, look for 61%. Image by author.

I don’t need the best, just give me a good one

Another interesting variation: what if you are not specifically interested in the best candidate, but you want a candidate that’s better than the majority? Or you want one out of the two best candidates?

In this case, the optimal strategy is a bit similar to when you are looking for the best. You start with getting a baseline: look and don’t hire. Then, for the next period, you stop if there is a new best candidate. If no best candidate is found during that period, the next period you stop if there is a candidate that is best or second best. This process continues until you reach the final candidate.

If there are seven candidates and you want to hire one of the two best candidates, use this strategy. Image by author.

Optimal stopping and data science

In different ways, optimal stopping is related to data science. The most straightforward ones are: solving optimal stopping problems with data science and using optimal stopping while working on a data science project.

Solving optimal stopping problems with data science

For a finite time horizon, dynamic programming is the easiest way to solve optimal stopping problems. When determining the value of the current option, you can use different (data science) methods:

Least-squares Monte Carlo (LSMC)
An interesting real world application of optimal stopping is trading of options in the financial market. Until the expiry date, the holder of an option has the right to buy or sell the underlying asset at a predetermined price. LSMC uses path simulations to value a current option. This helps in decision making (buy or sell).
Reinforcement Learning
It is also possible to use deep Q-learning for determining the best policy in an optimal stopping problem. This paper explains it and the benefits of using it compared to LSMC.
Deep Learning
Neural networks can be used in different ways for solving optimal stopping problems. Of course in deep Q-learning, as mentioned above, but also by predicting the expected pay-off over suitable stopping times. Another way of implementation is to vary the parameters of the neural network, which leads to the expression of different randomised stopping times.
Trees and explainability
A downside of many optimal stopping algorithms is explainability: it can be hard to explain the decisions it makes. This isn’t a problem in some cases, but can be in others. Explainability has benefits, because you can learn from the model and see the connection between the current state and the action. In this paper, the researchers create a tree model as policy to make interpretability possible.

Using optimal stopping theory in data science projects

Optimal stopping problems are everywhere: in healthcare, finance, management, and so on. Optimal stopping can also be relevant for data scientists. Why? Because it provides a framework for making decisions about when to stop collecting data or searching for the best solution in a problem space. In many data science problems, collecting additional data or exploring more options can be time-consuming, expensive, or both. You have the option to determine the optimal stopping point, in order to minimize cost and maximize the value of the information obtained.

In marketing, a data scientist can encounter optimal stopping problems. Imagine you have a fixed inventory of a product and you want to sell the product over a certain timeframe. With a price promotion, you can influence the buying behavior of people. In this case, the remaining stock can determine if you should stop the price promotion to maximize profit, or if you should continue with it.

Marketing example: decide after fixed periods of time to continue or stop with a promotion for a certain product based on stock level. Goal is to maximize profit. Image by author.

Also in healthcare you can find optimal stopping problems. One example is a transplant patient who is waiting for an organ. The value of the patient related to an organ is measured in estimated quality-adjusted life years (QALYs) that the patient will acquire if the transplantation will happen. The goal for the data scientist is to find a policy that will maximize the QALYs.

Conclusion

The next time you need to make a desicion, I hope you can use some of the tips from this post! If the process is similar to the interview example, e.g. when you want to buy a house, take 37% of your time to look, and then take the next best to find your dream house. If one of the two best houses is good enough, you can also take the second best house seen so far after two periods of looking for the best.

And to have all the questions answered: What are the rules when you are looking for a parking spot? With parking, there is the trade off between searching time for a good spot and the walking time between the parking spot and the final destination. In this case, it all depends on the occupancy rate. If it is equal to 50%, you can easily continue driving until you are as close as possible to your destination. But if it is 98%, you should take the next free spot you see when you are 35 spots away from your destination.

Research shows that people in real life stop earlier than the optimal stopping percentage of 37%. Maybe the reason for this are some of the real life complications mentioned earlier, like the cost of searching or that the second best is also good enough. As a data scientist, it is possible to use data science if you are dealing with an optimal stopping problem. You can also use optimal stopping theory to stop searching at the best point in time.

Solving Multi-Armed Bandit ProblemsA powerful and easy way to apply reinforcement learning.
towardsdatascience.com

Solving Nonograms with 120 Lines of CodePuzzles, combinations and solution gifs.
towardsdatascience.com

Snake Played by a Deep Reinforcement Learning AgentWith bloopers
towardsdatascience.com

When Should You Stop Searching?

An introduction to optimal stopping and how it relates to data science

When to stop?

How does searching work in real life?

No information vs. complete information

Rejection and returning to previous candidates

I don’t need the best, just give me a good one

Optimal stopping and data science

Solving optimal stopping problems with data science

Using optimal stopping theory in data science projects

Conclusion

Related

Solving Multi-Armed Bandit Problems

A powerful and easy way to apply reinforcement learning.

Solving Nonograms with 120 Lines of Code

Puzzles, combinations and solution gifs.

Snake Played by a Deep Reinforcement Learning Agent

With bloopers

Written by Hennie de Harder