Making Data Useful

Your dataset is a giant inkblot test

The danger of apophenia in analytics and what you can do about it

Cassie Kozyrkov
Towards Data Science
6 min readJul 17, 2019

--

There’s a fine line between telling stories with data and telling lies. Before I tell you how to spot a top-notch data analyst and boost your analytical excellence, let me scare you a little.

Here’s the audio version of the article, read for you by the author.

The psychological trap in data analytics

Humans brains are pattern-finding powerhouses… but those patterns don’t always have much to do with reality. We are the sort of species that finds rabbits in clouds and Elvis’s face in a potato chip.

Do these look like a rabbit and a portrait of Elvis to you? Image: SOURCE.

Take a moment to consider the Rorschach test — the one where people are shown inkblots and asked what they see — and you’ll appreciate just how eagerly the mind injects spurious interpretations into apparent randomness.

Bat? Butterfly? Or just an ink blot? This is the first of the ten cards in the Rorschach test, created in 1921. Image: Wikipedia

Psychologists have a pretty name for this tendency to conjure false meaning out of nothing: apophenia. Give humans a vague stimulus and we’ll find faces, butterflies, and a reason to allocate budget to our favorite project or launch an AI system.

Uh-oh.

There’s plenty of random noise in most datasets, so what are the chances there’s no apophenia going on with your analytics? Can you really trust your interpretation of the data?

What the mind does with inkblots it also does with data.

To make matters worse, the more ways there are to slice-and-dice those datasets and the more complex they are, the more vague they are as stimuli. That means they’re practically begging you to see false nonsense in them.

Complex datasets practically beg you to find false meaning in them.

Are you sure your latest data epiphany isn’t an apophany in disguise?

Another great word is pareidolia, which is a kind of apophenia (finding familiar things in vague sensory stimuli). In Japan, they even have a museum of rocks that look like faces. It’s a beautiful world.

Lies, damned lies, and analytics

If that sounds dismal, I’m not done yet. Taking data analysis courses can pour fuel on that psychological fire. Students are conditioned to expect that looking at data yields real meaning because every homework exploratory analysis exercise has buried treasure in it. Very few professors have the heart to send you on wild goose chases (for your own good!) and it’s hard to grade open-ended assignments, so you usually don’t get enough exposure to them as a student.

Students grow up believing that every dataset is ready to cough up a nugget of solid truth.

Data storytelling is just a hop, skip, and jump away from outright lying with data. Setting aside the issue of whether the patterns are real, let’s talk about multiple interpretations. Just because you see a bat shape in that inkblot doesn’t mean that there isn’t also a butterfly, a pelvis, or a pair of foxes in it. If I hadn’t mentioned the foxes, would you have seen them? Probably not. Psychological mechanisms related to motivation and attention have stacked the deck against you. It takes a special sort of skill to release the bat interpretation and force yourself to see a superposition of meanings.

Once people glom on to their favorite “insight”, they’ll struggle to unsee it.

The trouble is that once people glom on to their favorite “insight”, they’ll struggle to unsee it in favor of others. People tend to believe most strongly in whichever interpretation captured their attention first and each additional meaning reduces their motivation to keep searching. Juggling multiple potential stories without overweighting your favorite is a mental muscle that takes hard work to build. Alas, not every analyst has the discipline for it. In fact, many are incentivized to “prove” one side of a story through data exploration. Why grow skills that only get in the way of engorging your data science paycheck?

What color is your lightsaber?

There are ways to prove things with data (honestly and rigorously)— my data-splitting article will tell you more — but exploratory data analysis (EDA) is not one of them. Open-ended data exploration is always a fishing expedition. What determines the color of your lightsaber is what you’re fishing for.

If you join the dark side, you’re fishing for evidence to support a theory you already “know” to be true (so you can sell it to some naive victim). You might not even realize that your lightsaber is red if you genuinely believe in data objectivity and your own unbiasedness.

Open-ended data exploration is always a fishing expedition.

With a sufficiently complex (vague) dataset, you’ll find a pattern you can spin as support for your favorite story. That’s the beauty of the Rorschach test, after all. Unfortunately, it’s worse with data than with inkblots because the more mathemagical your method (p-hacking, anyone?), the more legitimate and convincing you’ll sound to those who don’t know any better.

Satellite photo of the “Face on Mars” which many people took as evidence of extraterrestrial habitation.

Those who reject the dark side also go fishing, but they’re after something else: inspiration. They’re looking for patterns that might be interesting or compelling, but they know better than to take them as evidence. Instead, they practice a sort of open-minded analytics zen with the discipline to be mindful of as many interpretations as possible.

The best analysts challenge themselves to find as many interpretations as possible.

This takes a sharp eye and a humble, unsticky mind. Rather than tricking their stakeholders into seeing only one side of a story, they challenge themselves to do the creative thinking required to digest the same data into as many stories as possible. They present their findings in a way that inspires rigorous follow-up without causing their leadership team to run overconfidently off a cliff.

Open-mindedness gives data analysis a chance to be worthwhile.

As an added bonus, the discipline to look for multiple interpretations is an analyst’s secret weapon for not snoozing past the real treasures buried in the data. If you’re distracted by a falsehood you believe in, confirmation bias makes it hard to notice evidence that points in the opposite direction. Why bother analyzing anything if your conclusions are determined in advance? Open-mindedness gives the whole endeavor a chance to be worthwhile.

This grilled cheese sandwich fetched $28,000 in auction because it features the Virgin Mary. Alternative interpretations of what we’re seeing, anyone?

Hiring a great analyst

If you liked my other articles about analytics, here are the traits you’re already looking for in a great analyst:

  • They don’t make inferences that reach beyond the data they’re exploring. [1]
  • They’re handy with data science tools and have the skills to sift through vast datasets quickly. [2]
  • They have relevant domain knowledge so they’re less likely to waste stakeholders’ time with trivia. [3]
  • They understand that their work is about prospecting for inspiration. [3] [4]
  • They visualize data in a brain-friendly way so that time-to-inspiration is kept as short as possible. [3]
  • They know what it takes to follow up rigorously on any potential insights they found (and whom to call for help with that). [4] [5] [6] [7]

In addition to all that, this article suggests you look for analysts with three more traits:

  • They’re aware that the mind finds meaning where it doesn’t exist, so they stay humble and avoid jumping to conclusions.
  • They don’t try to sell you a story found by torturing data until it confesses. Instead, they use hedging/softening language when talking about data.
  • They have the discipline to come up with multiple interpretations for everything. The faster they produce multiple explanations and the more alternatives they generate, the more the force is with them. Try interviewing for this skill next time you’re hiring an analytics Jedi.

Finally, if you’re a leader, turn a critical eye inward and make sure that you’re giving your people the right incentives. Are you looking for a data analyst or a data spin doctor? These take different mindsets (and skillsets!), so choose wisely and reward the right behaviors.

Forget potato chips! The Chinsekikan museum at Chichibua in Japan features an Elvis that really is the King of Rock.

Thanks for reading! How about an AI course?

If you had fun here and you’re looking for an applied AI course designed to be fun for beginners and experts alike, here’s one I made for your amusement:

Enjoy the entire course playlist here: bit.ly/machinefriend

--

--

Chief Decision Scientist, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own. twitter.com/quaesita