How Do Data Scientists Ask the Right Questions?

Audrey Lorberfeld
Towards Data Science
6 min readNov 2, 2018

--

I love questions. And, over the years, this love has allowed me to get pretty good at not only asking questions, but asking the right questions.

Through studying ancient Greek and Latin in college I became intimately familiar with some of the best devil’s advocates out there: Socrates, Aristotle and Plato; while getting my Masters in Library and Information Science I got to apply scholarly inquiry frameworks from critical-thinking powerhouses like Marcia Bates and Brenda Dervin; and now at the Flatiron School’s Data Science Immersive I get to learn the data-science way of questioning.

So, what does asking the right questions look like for a data scientist?

If you take a few seconds to Google “asking the right questions data,” you’ll retrieve dozens of ask-these-few-questions-and-you-can’t-go-wrong think pieces. These quippy articles are more likely than not embellished with diagrams like the one below:

Image from Piyanka Jain‘s piece “3 Key Analytics Questions to Ask Your Big Data” (2012).

The steps these images usually outline are something along the lines of: 1) agree on the KPIs against which you will measure success (Jain’s Measurement Framework above), 2) define those KPIs (Jain’s Portfolio Analysis above), and 3) identify your stakeholders (Jain’s Customer Analysis above). Once you do all of that, voilà! You have everything you need to craft a bonafide question worthy of data-driven exploration.

And while answering the above questions should be part of any data-driven analysis, simply answering them seems to fall short of what data scientists do when they “do” data science. So what’s the extra something we bring to the table (besides our incredibly cool grasp on calculus, linear algebra, statistics, probability, and programming, of course)?

Well, after being in this field for a whole two weeks, I feel qualified to say that the data-science way of questioning derives its unique power from…wait for it…SCIENCE!

(And no, I don’t mean that data scientists stand in a lab looking at bits and bytes under a microscope all day. I mean that data scientists question their worlds by employing techniques traditionally associated with the life sciences.)

This may seem obvious to some — science is literally in the name of the profession — but it remained a bit nebulous to me until writing this piece.

To remind ourselves what “science” entails, let’s take a trip back in time to the fifth grade and the scientific method. The first step in the scientific method (or the second, if you count Observe as the first) is to Question.

https://www.slideshare.net/jamesbirchler/experimenting-your-way-to-success-applying-lean-startup-principles-to-product-development-at-imvu

Back in the fifth grade, after asking a question, we were taught to form a hypothesis, then to test that hypothesis, and finally to analyze our results. But we were seldom encouraged to linger in the space between observing our data and drawing our conclusions (plus, why would we want to? The cool part was proving that vinegar mixed with baking soda makes things explode).

But that same liminal space we glossed over as fifth graders, the space where we could have tweaked our question(s) again and again according to some type of data-based feedback loop, seems to be where the data-science Special Sauce™ is made.

If we dive into some data-science pedagogy, we can see the similarities between the scientific method and what Joe Blitzstein and Hanspeter Pfister call “The Data Science Process”:

https://insidebigdata.com/2014/11/12/ask-data-scientist-data-science-process/

You’ll quickly see that the scientific method and The Data Science Process are very, very similar.

(If you want an even-sexier version, checkout Microsoft’s Data Science Lifecycle and sub “Business Understanding” with “ask a question”.)

And, most pertinent to exploration, they both emphasize iteration (Bircher’s scientific method diagram with its “rapid iteration” call-out and Blitzstein & Pfister with their ever-rotating arrows). The “science” in these processes seems to manifest in a question-test-analyze loop that we are supposed to repeat again and again. By iterating through these steps, data scientists are able to craft the right questions.

These right questions are those that Facebook Data Scientist Brandon Rohrer calls “sharp.” In “How To Do Data Science” he writes:

When choosing your question, imagine that you are approaching an oracle that can tell you anything in the universe, as long as the answer is a number or a name. It’s a mischievous oracle, and its answer will be as vague and confusing as it can get away with. You want to pin it down with a question so airtight that the oracle can’t help but tell you what you want to know. Examples of poor questions are “What can my data tell me about my business?”, “What should I do?” or “How can I increase my profits?” These leave wiggle room for useless answers. In contrast, clear answers to questions like “How many Model Q Gizmos will I sell in Montreal during the third quarter?” or “Which car in my fleet is going to fail first?” are impossible to avoid.

And of course, because we are ~doing science~ here, Rohrer’s post ends with the imperative “Then start over.”

Data science educator Raj Bandyopadhyay, in “The Data Science Process: What a data scientist actually does day-to-day,” similarly emphasizes the iterative process of questioning as the first step in a real data science analysis:

You start by asking a lot of questions . . . Once you have a reasonable grasp of the domain, you should ask more pointed questions to understand exactly what your client wants you to solve . . . [and] Bingo! You can now see the data science in the problem . . . [and] . . . you can frame the . . . request into data science questions . . . [and once ] . . . you have a few concrete questions, you [can] go back to the VP Sales and show her your questions.

…and on and on until you arrive at the question(s).

In summary, while think-pieces pronouncing to provide a paint-by-numbers solve for data analysis are useful for aligning on business objectives and making your boss happy, know that data science is so much more.

Data science is about the science! And science is about iterative questioning, which seldom can be wrapped up into a transferable do-this-then-that package. That is why data scientists command our famously high salaries and are magical tech unicorns worthy of those salaries and reverence – we are the nexuses of critical thinking, business acumen, and hard skills.

Einstein knew what’s up when he said

If I had an hour to solve a problem and my life depended on it, I would use the first 55 minutes determining the proper question to ask, for once I know the proper question, I could solve the problem in less than five minutes.

If I had an hour to solve a problem, I’d probably allocate at least 10 of those minutes to reading Stack Overflow, but hey, Einstein had to work with what he had.

--

--