Regression To The Mean — The Bitter Truth

Published in

Towards Data Science

5 min readJul 14, 2019

My attempt to illustrate regression to the mean

Picture this situation: You are the business analyst for a department store chain. All stores are similar in size and merchandise selection, but their sales differ because of location, competition, and random factors. You are given the results for 2018 and asked to forecast sales for 2019. You have been instructed to accept the overall forecast of economists that sales will increase overall by 10%. How would you complete the following data?

Store — 2018 $ Sales — 2019 $ Sales

1 — $1,000,000 —
2 — $10,000,000 —
3 — $2,000,000 —
4 — $15,000,000 —
5 — $20,000,000— $22,000,000

If you analysis was based on adding 10% to the sales of each store you were most likely wrong, but… why? Keep on reading.

The above example is adapted from Max Bazerman’s text Judgment in Managerial Decision Making and it appears in the last page of chapter 17 of the book Thinking Fast and Slow by Daniel Kahneman. In this chapter, Kahneman clearly explains what regression to the mean is.

WHAT IS REGRESSION TO THE MEAN?

Regression to the mean, simply put, is the natural tendency of extreme scores to come back to their mean scores.
In the above example, predicting 10% sales for each of the stores is an error of judgment because your forecasts need to be regressive (i.e., adding more than 10% to the low-performing branches and adding less (or even subtracting) to the high-performing ones). This is because since all the stores are similar in size and merchandise selection, but their sales differ because of location, competition, and random factors, those stores that did extremely well in 2018 are likely to have a lower sales growth in 2019 than the rest and those that did very poorly are likely to have a higher sales growth in 2019 than the rest.

UNDERSTANDING REGRESSION TO THE MEAN

Citing Kahneman’s example, imagine a group of depressed children that have been treated for eight weeks with an energy drink. Once they finish the treatment, the clinical results show significant improvement in their depression state.

After reading the above case, most of us tend to make (unconsciously) the following equation in our heads:
Energy Drink → Depressed Children = Improvement

Now imagine the same example, but this time instead of the energy drink, those depressed children have been treated by standing on their head for 20 minutes a day for eight weeks. I am sure this time (consciously) you did not make the following equation on your head:
20' Standing On The Head → Depressed Children = Improvement

The bitter truth is that depressed children are an extreme group and like the extreme sales numbers from the stores, they will naturally tend to regress to their mean emotional state. This would most likely happen whether they drink or stand on their head.

These regression effects are present everywhere. In sports especially, are very common. For instance in soccer, when those teams that tend to win most of their games experience a three-game winless streak, the press starts claiming that they are facing a major crisis. However, after a couple of games, they most likely come back to their average streak of winnings.

The sports headline by the daily newspaper The Telegraph on 26 December 2018

WHY DO WE FAIL TO RECOGNIZE REGRESSION TO THE MEAN AND THE SUBSTANTIAL RISK OF DOING SO

We oftentimes experience difficulties in recognizing regression to the mean because our minds are strongly biased toward causal explanations. We, humans, yearn to believe that there is a causal effect behind everything we detect (or want to detect), like:
Energy Drink → Depressed Children = Improvement

The bitter truth is that we are far too willing to believe that much of these effects are a consequence of randomness. If not ask yourself whether you would buy a sports newspaper with the following headline:

“Despite the team has suffered a three-game winless streak, they will likely come back to their average winnings as they’ve experienced unfortunate games, it’s a matter of randomness”.

Most of us would not buy the above headline because we would not want to read nor believe that the consequence of our team losing = pure randomness. Instead, we would find a causal explanation such as the coach is not the appropriate one and players have become lazy after so many victories.

Additionally, the consequences of these effects go beyond personal awareness, in terms of organizations, for example, if a company decides to implement a training initiative by spending a huge amount of its payroll to train their employees with the goal of increasing its unusual year of poor customer satisfaction rates and indeed results show a marked improvement on customer satisfaction rates, managers might tend to believe:
Training Initiative → Employees’ Performance = Customer Satisfaction Improvement

The bitter truth is that without including a control group in the training initiative (i.e., employees that receive training and employees that do not), the causal explanation of the good results might be a serious flaw, a plausible consequence of regression to the mean.

HOW TO BECOME MORE AWARE OF REGRESSION TO THE MEAN

Here are a few tips in case after reading this you ask yourself, how can I become more aware of randomness in the effects I detect every day regarding regression to the mean?

Correlation is not causation
Despite that two variables might present a relationship, this does not mean that one is causing the other.
e.g…
Energy Drink → Feeling Depressed = Feeling Better
Training initiative → Employees taking training = Improvement on KPIs
Lucky Socks → Exam = Pass
The more extreme the original score, the more regression you should expect
Extremely poor sales → Better improvement expected
Average sales → Normal improvement expected
Accept randomness
It sounds easy, but it is not, try to become aware that life is full of randomness, luck, and fortunateness. Learn to deal with it.

REFERENCES

Kahneman, D., Thinking Fast and Slow. Macmillan (2011).