
Humans have plenty of cognitive strengths, but one area that most of us struggle with is estimating, explaining and preparing for improbable events. This theme underpins two of Nassim Taleb‘s major works: Fooled by Randomness and The Black Swan: The Impact of the Highly Improbable. In the latter work, Taleb defines a black swan events as having three characteristics: the event is a surprise (to the observer), it has a major effect, and people incorrectly try to rationalize it in hindsight (emphasis mine).
Taleb focuses on black swan events on the world stage, such as the creation of the internet, World War I, and the dissolution of the Soviet Union. I believe that humans behave similarly, and to their detriment, with less unlikely events predicted by statistical models as long as they’re important. We’ve seen this recently in the results of the 2016 presidential election results. Models published by The Huffington Post, The New York Times, and 538 all had Hillary Clinton at a 71–98% chance of winning. After she lost the election, a common reaction was whether we could trust polls [and the models that rely on their data] again, which I was surprise to see even from statistically minded friends. Why is our gut reaction to scrutinize the model that said Trump had a 29% chance of winning? We wouldn’t question the fairness of a coin that comes up tails twice (25% chance) or a 1 of a rolled die (17% chance); both of which are lower than the odds given to Trump winning the 2016 election.
I posit that this reaction, which is a form of rationalization by hindsight, is due to the lack of a mental framework to grapple with real world event probabilities, and that not having this framework can result in irrational decision-making. In order to illustrate this, let’s move away from the world of election modeling to a more common situation encountered in the business world.
The darkest timeline
Imagine you’re a data scientist who models the Probability of closing sales with potential clients at a business-to-business tech company. Your company is in the process of courting a particularly large client, Megacorp, and so the CEO checks your model every day to track the odds of closing the sale.
Thursday: 98% of closing sale
Incredible news! Whatever your teammates are doing looks like it’s working.
Friday: 99% of closing sale
Even better! The CEO starts planning the changes he’ll need to make to your business after the anticipated windfall. Hire more employees? Acquire a competitor? Buy a superbowl ad?
Monday: 0% of closing sale
What happened?! You learn that the prospective client chose to go with a competitor at the last minute. This is a huge blow to everybody’s morale, and the time and energy the CEO spent dreaming about expansion after the sale he thought was practically guaranteed was wasted.
What’s a common reaction to this fiasco? "The model must be wrong!" the CEO declares during the postmortem meeting he calls with you. "We should have known that this wasn’t as likely as you said it was. I don’t want to hear from you until it’s fixed."
Creating a problem
It’s possible that the CEO is correct. Your model could have predicted a lower chance of closing the sale that your company didn’t end up getting. However, it’s also possible that the model is fine. Your company had a very high chance of closing that sale, but the unlikely happened, and you just didn’t get it.
You are now in an uncomfortable position. It’s possible that the only way of "improving" the model, is to produce one that estimates lower odds of closing the Megacorp sale at the expense of modeling future sales accurately. This is called overfitting. By complying with your CEO’s request in order to stay in his good graces, you’re doing damage to your model – and the company – by distorting predictions of future sales.
Back to basics
Before we get to my proposed mental framework that can help folks better understand complex, real world event probabilities, let’s start with a simple example. Imagine you have a fair six sided die. You roll the die over and over and start recording results. After many rolls, you observe what you’d expect: around the same number of rolls come up with each number, one through six.
After a few hundred rolls, you get an unlikely, and unlucky (in most games) streak. A 1, followed by another 1, followed by another 1! The odds of this happening are 1/216 = ~0.5%. This is an unlikely event. Are you surprised? Do you doubt the fairness of the die? Do you tell the die it better fix itself before you roll it again?
Most people, even those without any statistical training, would say no. What happened was unlikely, but the die was behaving as expected otherwise. Additionally, if we think hard enough about it, we can explain the mechanism behind the randomness of the die rolls. Once the die leaves your hand, it is subject to the laws of physics, and is entirely predictable. The reason die throws are random is that slight changes in the velocity and orientation immediately after release result in different outcomes. Nobody has the fine motor skills necessary to cheat at die rolling, which would allow them to let some numbers appear more frequently than others.
We can use the idea of statistical populations, which are "a set of similar items or events which is of interest for some question or experiment" (Wikipedia). In this case, the items of interest are die rolls. As velocity and orientation vary, we get different outcomes.
When we roll the die, what we’re doing is sampling from this population. The population of all possible throws (even the ones where the die drops out of your hand or is hurled out of an open window) contains equal shares of each outcome. If we think of this in terms of area, as is shown in the diagram above, we can imagine that the population has equal areas for each outcome, one through six.

Before we add some complexity, I want to reinforce the idea that although there are only six equally likely die roll outcomes, there are infinite combinations of velocity and orientation, which when obeying the unchanging laws of physics, result in those outcomes. We can imagine doing two distinct types of work with this strong understanding of die rolling:
- We can use our understanding of the equally weighted population outcomes to come up with optimal strategies in different areas such as games, gambling, and business.
- We could try to use our understanding of the mechanism driving the equal probabilities, namely that the velocity and orientation of the die after release can perfectly predict its outcome, to try to get certain outcomes to happen more frequently. A human couldn’t do this, but perhaps we could create a robot that can throw a die that results in 6s every time.
Adding back complexity
What’s the statistical population in our sales example? Most folks would answer that it’s all possible sales clients. I’ve always found something frustrating with this definition: we can’t use it to explain why our model returns a 99% chance of landing the sale that we didn’t get without admitting that it must be model error. If that’s the case, our CEO – who is not an expert in probability and Statistics— feels justified in his request.

One way we could address this is to group predictions together in accuracy buckets (e.g. 0–10%, 10–20%, … 90–100%) and chart their accuracies jointly. A well performing model will show that these predictions behave around what’s observed in aggregate.
This still doesn’t satisfy my need to understand that predicted 99%. Imagine that in aggregate, we predict 96% of the sales in the 90–100% predicted bucket to be closed, but only 95% are. One could still argue that by fixing the model, and bringing down our predicted 99%, we still improve this aggregate accuracy.

Unlike in the die rolling example, we can’t easily resample from the population. We can’t relive history, pitching Megacorp 100 times in 100 slightly different ways and land that sale 99 out of those 100 times. But we can imagine doing so with multiverse theory! In multiverse theory, specifically Level III: The Many-Worlds Interpretation, all possible outcomes of probabilistic events happen in some universe and do so in proportion to their probabilities. I’d like to define a multiverse population as the statistical populations (in this example: all potential clients) in all possible universes. Expanding upon the statistical population example above, we can turn every dot, which represented each potential client, into a circle which represents that potential client’s outcomes in all universes. The higher the likelihood of winning that client’s business, the greater the proportion of universes in which their business is won, and therefore the larger the area in the client circle. Note that if I shaded in the entire multiverse population, it would resemble the die rolling diagram I shared earlier.

Benefits of having a multiverse mental framework
Astute readers have noticed that we cannot use this multiverse approach to probabilities to prove that our model is correct. We can’t hop from universe to universe, tabulating the proportion in which we win the Megacorp sale to show our CEO that our model is working as intended. In fact, it’s impossible to show that any model perfectly explains the world. The best we can do is to know that a model reasonably predicts samples from our population of interest, often by contrasting it with other, less accurate models. Even then, we still rely on assumptions to do so, one of the most important of which is that the systems does not undergo unanticipated structural changes. All of our predictions will be off if such a change happens. In our example, a strong competitor entering or leaving the market would represent such a change.
In my experience, many data scientists ignore or even dismiss non-empiric ideas such as this multiverse theory approach to grappling with probabilities, but they are doing so at their own peril. The first benefit of using the theory is that it is an accessible way of explaining probabilities to those who don’t have a strong background in them, like the CEO in our example. It’s possible that explaining your model outputs using multiverse theory helps consumers of your models become more comfortable with how to interpret them. It can be done on its own or addition to a short explanation on overfitting to persuade our hypothetical CEO to drop his (potentially) irrational demand to tweak your model until it brings down the Megacorp sale probability.
The second benefit is that the multiverse theory approach to interpreting probabilities is that it gives us a useful structure for what-if analyses. Remember the mechanism for randomness in our die rolling example? I shared that the reason a fair die comes up with each outcome in the same proportion is because small changes – smaller than we can control with our fine motor skills – result in different outcomes. We can show that using a little more force or a different orientation turns a throw that would result in a 1 into one that results in a 2. Unfortunately, business outcomes cannot be predicted perfectly by a system as consistent as classical mechanics. However, we can try to imagine what we could have done differently to increase our chances of closing the Megacorp sale and use it to help with future ones. People do this all the time when they do what-if analyses: one can imagine the CEO thinking "if only I had answered Megacorp’s questions about our service personally" or "I wonder what would have happened if we gave them a discount because they’d be an impressive client to have." We need to understand that neither of these actions guarantees the sale, but they can increase its odds, and applying the multiverse theory we can imagine flipping universes in which the sale is not made to ones in which it is. In the below diagram, I demonstrate what this would look like for the Normalcorp client, which originally has a 50% chance of sale, as increases from 50% are easier to see.

I hope that viewing event probabilities using multiverse theory helps close the gap between understanding simple probabilities often used in stats courses (e.g. flipping coins, rolling dice, or pulling cards out a deck) and complex ones in the real world (e.g. probability of closing a sale, likelihood of converting a user to a paid subscription plan). For some fun explorations of thinking about probabilities through multiverses, I’d recommend the Remedial Chaos Theory episode (and subsequent ones that reference it) of Community, in which a die roll creates multiple timelines, and all of Rick and Morty. Try it out for yourself, share and discuss the idea with others – including me, I’ll do my best to respond to comments here or on Twitter, and see if it helps makes sense of complex real world phenomena and enables more rational decision-making.