Monty Hall’s Assistant
The Danger (and Power) of Intuition in Data Science
As I start my new journey as a career Data Scientist, I had to cross paths with my old hypothetical nemesis: The Monty Hall Problem. To those of you who have a deep understanding of Bayesian statistics, welcome to Talking to People Who Hate Math 101 and to those of you who don’t know what I’m talking about, let’s discuss how the human understanding of ‘truth’ and ‘luck’ can lead us astray with the help of a couple of goats.
Data and Communication
This article is meant to be helpful to two kinds of people: those with extensive statistical experience, and those who vomit in their mouth at thought of conditional probabilities. As the field of data science evolves and deepens, as we dive further into the complexities of Machine Learning, we are in danger of creating a Comprehension Gap between data scientists and the stakeholders who employ them. In my experience, C-Level executives tend to lack the intensive math background to understand technical jargon. It isn’t their job to know that stuff. Their job is to lead, to set OKRs, to manage people, to hire you to understand it. Our job, as data scientists, is to be an expert in our field and translate the underlying network of interlocking mathematical theories into bite-size presentations.
A Data Scientist cannot crunch numbers in a vacuum. As mathematicians, we have to both uncover the truth and share the truth. And sharing the truth is typically where this process breaks down.
The Issue with Intuition
Humans are pattern-making beasts. We look to the past to inform our actions. We are also spiteful and chaotic. If you believe that human behavior can truly be perfectly predicted on a granular level, you’re in the wrong field, or you’re going to get a rude wake up call. Here’s a fun example:
The Salesperson – You’ve been hired to analyze the sales patterns at a software company and so you, the diligent data scientist, begin crunching numbers, running ANOVA tests, A/B testing, you even build a beautiful model that predicts, with 80% certainty, that the best time to schedule a follow up call with an account is 17 days after the first email. Your data says this will, on average, increase sales. Done deal, right? You tell the sales team your findings, and if they do that simple thing, call clients at the 17 day mark, their sales will increase. You have the data! The truth is in your hands!
Well, Salesperson A replies with, "Yeah, that’s not how I do it. I like to get in on day 2. Day 17 is too late."
How do you respond? Do you tell them they’re wrong? That the data is clear? That the other salespeople are doing the 17 day rule and they have a higher conversion rate? For some people that might be enough, but Salesperson A trusts their gut. How do you convince them?
The Monty Hall Problem
This is both my favorite and least favorite thought experiment in probability. I’ve always had difficulty understanding probabilities intuitively. Sure, any layman gets that the chance of a 6 of a 6 sided die is 1/6, but ask them what’s the probability of getting a score of 7 on two dice while getting a tails on a coin flip and everyone’s head starts exploding. I have always struggled with Monty Hall, but it’s the best tool to show how misleading intuition can be.
Goat or Car? Monty Hall was a popular game show, but it won’t be remembered for its charming host or musical guests, it will be forever scribed into the annals of Bayesian Statistics because of one dumb game. The set up is seemingly simple: the contestant has a chance to win a new car, but it’s hidden behind one of three doors. Behind the other two doors stands a goat.
(Side note – why goats? Who took care of these goats? Did they have a Monty Hall petting zoo? Were the goats enemies? Friends? Lovers? I have a lot of head cannon questions but I digress).
The game: the contestant picks one of the three doors. After the choice is made, the assistant opens one of the other two doors to reveal a goat. So two doors remain, your door and one more. Which one has the car behind it? Well, you’re given an option: stay with your current door, the door you picked in the first place, the door that spoke to you, the door that the universe said ‘that’s the freaking door!’…or do you change your mind?
If you don’t know the twist, you probably think you want to stay with your first choice right? Why is that? Is it because changing your mind is seen as weak? Is it because you believe that luck is on your side? Those internal, invisible questions are why Monty Hall got to keep playing this game for years without giving away too many cars. They turn a statistical game into a judgement on you and your intelligence. "Of course I picked right the first time," we think and then get to walk home instead of drive.
Probabilistically? You’re wrong. If you play the Monty Hall game and you always stay with your first choice, you will only succeed 1/3 of the time. However, if you always swap, you win 2/3 of the time.
How Intuition Messes Us Up
This choice might feel ‘wrong’ to you. It goes against your gut. Now, I could sit here and explain Bayes theorem and why 2/3 works, but that won’t satisfy your annoyance at being wrong. At this point, you either know the theorem, or you don’t care about it. Those probabilities just don’t feel right. Why not? Let’s break it down.
What’s the chance of picking the right door in the first place? Well, it’s an easy 1 in 3. There’s no way to increase those odds, you’re picking a door and either a car is behind or a goat. But then…a door opens.
What’s the chance of picking the right door a second time? This is where the brain trips up and it’s for two reasons: 1) you aren’t asking to ‘pick a door’ you’re being asked to ‘change your mind’. Anyone who follows politics knows that changing your mind is tantamount to weakness, and so people have no desire to do it. Trap number 2) You see these choices as independent of the first choice. That means, even if you’re asked to pick between two doors, it feels like there’s a 50% chance of success…so both doors are equally likely to have a car, right?
Nope.
Both of these approaches are wrong and you’ll see why just by changing our perspective a little.
New Perspective #1 – What if there were 10 doors? It’s the same game, but one car and ten doors (which means 9 goats, damn that Monty Hall petting zoo is getting crowded!) You pick a door and then they reveal a goat in one of the other 9 doors. You now have 8 choices left. Do you stay with your 1/10 of a chance? Or do you pick another door? I expect in this version of the game, it would be foolhardy to NEVER examine your first choice. Every open door is more information, it’s another option gone and your chances of getting the correct door increase. This is why the Monty Hall game is played with 3 doors and not 4. Because if you ‘change your mind’ once, you’re apt to change it multiple times. So that dastardly Monty only gives you one chance to change your mind.
This is the essence of Bayesian Statistics. Reverend Thomas Bayes asked the question, "How do I find something if I have no information about it?" Well, you take guesses and see where you landed and then you try again. It is an iterative and cumulative process of gathering more and more experimental data to get you closer to a question that is difficult to answer. This makes sense, right? The more experiments you do and knowledge you glean, the better your choices will be.
New Perspective #2 – What about the assistant? What if we’re looking at this problem all wrong? We’ve been focused on that contestant trying to get a new Honda Civic. What if we consider the POV of the assistant who has to open the doors? I know, it’s not a hard job (unless there are specific goat-related challenges that come up), but the problem suddenly coalesces into something that makes sense when you watch the game from the sidelines.
Let’s say, the contestant picks door 1 and the car is behind door 3. Which door does the assistant remove? Door 2. They can’t open door 1, because you chose that one and that would ruin the game. They can’t open door 3, because, you know, that would be too easy. And so, when the contestant picks the wrong door, the assistant only has 1 out of 3 options to remove.
What if you pick the correct door? Then the assistant has 2 doors to pick from! Which do they choose? Who knows! That must be the exciting part of the job. In this case, they have 2 out of 3 options.
Remember, you only had a 1/3 chance of being right in the first place. Which means, 2/3 times, the assistant only has one option and the remaining door has a car behind it. Therefore, 2/3 times, staying with your first choice is the fool’s choice. The removal of the door is new information and it DOES affect your next choice. The assistant doesn’t remove a door at random, they do it because of an algorithm. That’s why the game is weighted. In essence you’re only picking between two doors, not three, just one has a 1/3 chance of success and the other has 2/3.
Suddenly those probabilities make sense! And it simply required a shift of perspective.
Back to the Salesperson
To the strong-willed, a change of perspectives is a challenge. Convincing someone to go against their gut, even with a wealth of data on hand, is always going to be an uphill battle. Remember, your job as a data scientist isn’t to change someone’s world, it’s to communicate the truth as effectively as possible.
But, maybe, just maybe, you can sew a little doubt to help those headstrong individuals to encourage them to look outside of what ‘works’ to see if they can discover ‘what works better’.
Our job is to discover and communicate truth. We study, read, and blog about the discovery part, but all of that work is nothing without Communication. For many of you starting to code, playing with pandas, exploring GitHub, you’re learning the tools of the trade. But never forget, the most difficult tool to master is how to convince people who just want to feel right.