The world’s leading publication for data science, AI, and ML professionals.

What You Don’t Learn About Data in School

How to get usable results with imperfect data

Photo by Bradley Hook from Pexels
Photo by Bradley Hook from Pexels

There are numerous resources for data scientists and data analysts to learn about Machine Learning, SQL, Python, data visualization, and so on. These are extremely helpful for people starting their data analytics careers and for those trying to improve their skillset. However, school doesn’t prepare you for what I call "data reality". You learn with perfect complete data but what you get in the real world is dirty and sometimes incomplete data. The ability to adapt what you learn using perfect data and apply it to "data reality" is what will make you successful – and these are a couple of examples of how.


Show Directional Results

I was once asked to help marketing set up an A/B test to evaluate the effectiveness of their email series aimed at converting users to start a trial and become paying members. I was pulled away to work on other projects and we had to wait until a marketing data analyst was hired before the A/B test could be evaluated. This is when we discovered the test hadn’t been set up correctly after it had already been running for 6 months. The control and test group proportions weren’t a 50/50 split as we had originally intended.

If this had been a class on A/B testing, you would’ve received perfect test data with the proper 50/50 split, enough users in the sample size, and proceeded to evaluate statistical significance. All of these parameters weren’t met for the actual email test. We couldn’t tell marketing we had to rerun the test and wait another 6 months. How did we salvage this test with imperfect data?

Statistical significance was thrown out because the data didn’t meet the criteria for a proper A/B test. Cohort analysis was the only way we could come up with to salvage the results. Users were segmented between the control and test groups and then broken down by users that clicked or opened the email versus those that didn’t to show product engagement, trial start rates, and conversion to paying members.

In the stakeholder presentation, results were noted as directional only and that they didn’t meet the criteria for statistical significance. Marketing was happy though because the test group showed higher engagement and trial start rates compared to control even if it wasn’t significant. The reality is stakeholders need to report results to their boss and as long as they’re positive even if it’s not statistically significant, it’s better than showing negative results.

You may wonder how we would’ve dealt with the results if the test group had ended up with lower engagement versus control. Then we might’ve tried analyzing each email in the series to identify ones in the test group that had lower engagement versus control or ones that had a lower trial start rate. There are endless possibilities for an A/B test to go wrong and adapting to show directional insights is one way to salvage results.

Takeway: In absence of perfect data, segment your users to find directional insights. Stakeholders don’t need perfection. Sometimes guidance in the right direction is enough until better data comes along.

Adjust For Data Gaps

Marketing attribution is harder to implement in reality than what you learn in school. The problem is companies have little to no tracking to attribute sales and conversion to a particular marketing touchpoint. Although I knew these issues, when I was asked to attribute marketing efforts to revenue I couldn’t very well say it wasn’t possible. How did I attribute revenue to marketing efforts with partial tracking data?

First I identified all marketing campaigns that drove revenue such as paid marketing ads that led users to start a trial and the trial onboarding email series aimed at converting current trial users to paying members. Then I used a combination of individual and aggregate conversion rates to apply to campaigns where I had user level versus campaign level information to estimate revenue. The results weren’t perfect but good enough for a first pass until the attribution tracking improved.

Takeaway: Fill in data gaps with aggregate information or an external source until better data comes along. The results won’t be perfect but having directional insights is a good first step.

Final Thoughts

No matter how many classes you take, it’s never the same when you encounter a real-life situation. The key to success is being able to take what you learned and apply it to the data available. While I haven’t covered every imperfect data scenario, I hope this gives you a head start on how to deal with your "data reality".


You might also like…

How I Used a Machine Learning Model to Generate Actionable Insights

How to Present Machine Learning Results to Non-Technical People

How to Translate Machine Learning Results Into Business Impact


Related Articles