Recently, I have been asked the question: what is even more important than the Data quality? And I realised I might be providing an incomplete narrative in my Data Quality (DQ) blogs.
DQ is undoubtedly an essential aspect of data; however, there are many more facets than simply improving its quality. And merely improving the data quality will not give you all the benefits. I have distilled these down to five things we ought to ensure we get correct about data.
Let’s dive into these:
1. How accessible is the data?
Data accessibility has been the challenge of the decade. So much data but no way to access it. Draconian corporate access policies, limited knowledge across the teams, single point of failures, over-reaching regulatory policies. You know where this is going.
An essential facet of your overall Data Health is how easily accessible the data is? Are the tools and processes available for someone to get hold of that data? If processes exist, are they manually dependent on your IT teams or automated?
Data stored in a siloed warehouse or lake has zero value – it needs to be in the hands of the business users to make critical decisions or find new insights. Data democratisation is trying to solve this issue; however, it is rarely implemented correctly (that’s a blog for another time!)
2. Does the data have any context?
Right, I have access to this data, but I can make no sense of it. The columns have technical names, the logic and aggregations are unexplained. The knowledge is in the Engineer’s brain. The data has no business glossaries or catalogues. Imagine trying to find a product on Amazon without any catalogues or descriptions.
Metadata of the lake needs to be captured to provide context to the data. You should set up a basic search functionality with technical and business data interpretation. At a bare minimum, the data needs to be catalogued and searchable.
3. How is the quality of the data?
Didn’t I say the quality is still important? Now that you have access to the data, you understand the context behind the data; is the data of a good enough quality to be consumed? Do you have missing columns/rows? What about duplicate entries? Are there consistency issues between tables?
These are all common DQ issues that ought to be resolved before self-serving that data can become a reality. I have written extensively about this; see the article below.
Apply Data Quality Checks at These 5 Points in Your Data Journey
4. How well is the data protected?
Making data accessible shouldn’t mean trampling over regulatory and ethical requirements for protecting the data. It is balancing between making the data accessible and protecting it.
Data policies should be in place, including access, retention and deletion. Regulatory policies to ensure GDPR (or other equivalent regulations) are implemented correctly, should be in place too. It would be best if you inherently built the purpose of using this data and minimising its collection in your customer-facing systems.
5. How well is the data governed?
Finally, we get to Data Governance. I don’t mean this in the traditional sense of having committees, forums, owners, stewards etc. I mean providing a structure for all the above functions to operate.
- Do you have core roles and responsibilities outlined in a framework?
- Who provides access?
- Who is responsible for DQ?
- Who is responsible for cataloguing and glossary?
- Who is responsible for data protection/privacy?
If you cannot answer the above questions unequivocally, you have some work to do in your Data Governance space.
Conclusion
You can apply a simple weighting to each question, for example, 20% to DQ. This weighting will allow you to work out an overall average of your Data Health. The score generated from this weighting exercise can be socialised across your business or made a core metric of the Chief Data Office.
The goal should be to improve this metric to an acceptable target for the whole enterprise. If this has resonated with you, please share your thoughts by leaving a comment below.
If you would like to learn more about DQ in your data journey, check out this article:
Top 15 Most Common Data Quality Issues (And How to Fix Them)
If you are not subscribed to Medium, consider subscribing using my referral link. It’s cheaper than Netflix and objectively a much better use of your time. If you use my link, I earn a small commission, and you get access to unlimited stories on Medium.
I also write regularly on Twitter; follow me here.