The world’s leading publication for data science, AI, and ML professionals.

3 Ways To Overcome Data Quality Challenges in an Analytics Project

Proven Strategies for Tackling Data Quality Issues in Analytics Projects

Cleaning up your Data is like cleaning up your diet; you know you should do it, but the doughnuts are too tempting!

It requires discipline, processes and accountability from leadership to prioritise. But it is often the most overlooked aspect of any data project. If your organisation has been ignoring it for years and you are now in the midst of a project you can’t finish, this article is just for you.

Here are three ways to overcome Data Quality issues in any analytics project.


1. Start with the End – What Business Problem Are You Solving?

There is a lot of unnecessary noise when the projects are going wrong.

You have to become a prioritisation ninja and understand what data truly matters. If your end output is required to conduct customer churn analysis, focus your efforts on prioritising data that will help in customer retention offerings. It’s simple, but in the real world, data is messy, and systems are disparate & undocumented.

Starting with the end will help you draw up a lineage all the way back to your problematic data and pinpoint exactly where your project resources need to put their efforts.

Real-Life Example

I was involved in a project to consolidate outdated client systems to identify the most up-to-date and accurate customer addresses for marketing campaigns. Nearly six months were wasted trying to clean customers’ address data in source systems, which ultimately was unnecessary. This is because the requirements were not translated accurately; you only needed their first line of address and postcode to target a customer. The rest of the data could be added using a reference dataset like Royal Mail’s Postcode Address File (PAF). Start with the end; what are you trying to achieve?


2. Define How Good Your Data Needs to Be – It Won’t Be 100%

Most of the time, your data won’t need to be 100% complete and accurate.

You have to be a reasonable negotiator; stakeholders often want a perfect world. They may have used legacy systems and built their processes to get to a good level of accuracy. The same expectation can’t be put on a brand-new system that is paying off years of technical debt.

Agree on the success criteria up front, avoid vagueness like "when I feel it’s good enough" or "when it reconciles to my legacy system (which I took 15 years to finesse)". You won’t need 99% accuracy if it’s for marketing.

Real-Life Example

I spent weeks arguing/negotiating with a stakeholder about why some incident data need not be 100% accurate in a data project creating service management dashboards for the client. We agreed on a prioritisation methodology, where the importance of the data is charted on one spectrum and the cost to fix on another.

Items in green were then put in the backlog to fix. Data will always have anomalies; pragmatism over perfection will allow you to go to market quickly.


3. Understand the Resolution Path – Requestor, Payer, and Executor

Finding the issue is one problem; fixing it is another.

Most organisations have messy structures; governance models are not embedded or clear. The projects usually end up delivering more than they bargained for. With all that in mind, a responsibility and accountability matrix is a good idea.

The marketing team may raise the issue, but since it’s Finance data, that team may have to fix it. But because neither teams have resources, your project team may have to oversee the resolution. Agree on the ownership upfront, so when you uncover data quality issues later, you know the path to resolving those.

Real-Life Example

A long-running data transformation project I was part of required significant legacy data to be cleansed. The timelines to fix old, archaic. systems were into months which would have de-railed the project. The project team decided to spend a lot of time on 1 & 2 (i.e. prioritisation and negotiating with the end user). Once we had a list of critical attributes that had to be fixed, we agreed that the cost to fix would be shared by the project team (as it’s critical to the success of the project) & the end user team (as it’s critical for their reporting). This is a compromise in the absence of a robust Data Governance framework.


Conclusion

Done is better than perfect – this is etched in my brain after delivering numerous analytics projects. As organisations embark on their transformation journey, the cost of poor-quality data should not be underestimated.

You wouldn’t have to deal with so many Data Quality challenges tactically if a strategic framework was in place and operational. It is one of the hardest things to achieve, but consequently, the most significant ROI.

Want to learn how to do that? Check out my FREE Ultimate Data Quality handbook and join my Medium email subscriber list.

Ultimate Data Quality Handbook – FREE!


Related Articles