Data Observability is all the rage in the industry right now.
Vast amounts of data and endless Data Quality issues have led us to this stage. Gartner sees it in the "innovation trigger" phase and has at least 5–10 years’ worth of growth left before it plateaus. This means there are plenty of untapped use cases for Data Observability.
So, let’s look at seven of those use cases specifically to improve Data Governance.
1. Predicting Data Issues
Historically Data Quality processes have been reactive.
The reactiveness also came from end business users. This meant the data team lost a lot of credibility, whilst Data Quality issues caused havoc for the business teams. Predicting Data Quality issues can help you take proactive measures to prevent or address them before they arise.
This leads to trust and reliability from the business teams and less burdened data teams.
For example: Applying an ML layer over historical Data Quality issues with relevant data points such as the system of record, pipelines, frequency of failure, time of the month failure, and types of failure will build a predictive model to help pinpoint Data Quality issues before they arise.
2. Intelligently Triaging Data Issues
Finding a Data Quality issue is the easy half of the problem.
Finding someone who understands the problem and then can subsequently fix the problem is a more significant challenge. Rapidly triaging the Data Quality issue and routing the problem to the most effective team can reduce the total downtime by 60%.
This avoids the endless Slack messages and conference calls to find the right person.
For example: Create a list of data owners, custodians and application owners in a reference data table, along with the types of Data Quality issues they usually resolve if you have the budget apply an ML layer on top, which learns which teams are typically involved in resolving issues. Now use the Data Quality issue categorisation to route this to the right team.
3. Improving Data Pipeline Efficiencies
Observing the data is more than just fixing Data Quality issues.
Finding core engineering problems, such as inefficient pipelines, is another use case that can be improved using Data Observability. If the processing times are higher than an agreed threshold for a pipeline, it can be pinpointed to the engineering team to look at ways of creating efficiencies.
This can streamline processes and reduce unnecessary communication and team time waste.
For example: Create a timeline threshold and apply it to available pipelines, use Observability to pick out anomalies, and direct those to the engineering team to identify the most efficient ways to process and analyse data processing times.
4. Automated Data Cleansing & Resolution
This is nirvana.
As an organisation’s Data Observability capability matures, the ideal end state should be a human-less process. Known source Data Issues can be automatically cleaned each time they occur.
This can lead to a seamless experience for the business users without impacting the data teams.
For example: An ML layer or a rule-based list of Data Quality checks and their resolution can be developed. This is then deployed along with the data pipelines to identify and correct Data Quality issues in real-time. This will also record all the changes in an outcome table should human intervention be required.
5. Data Access Management
Observing data access is another way to improve Data Governance.
Data Observability can help you monitor who is accessing your data and how it is being accessed. By tracking data access logs, you can identify unauthorised access attempts, suspicious activity, or potential data breaches.
This can help you take corrective action and prevent data privacy violations.
For example: An Observability check can be implemented based on data lineage, including downstream use cases. Any access attempts not meeting the original intention can be flagged to the relevant teams to investigate.
6. Tracking Data Compliance
Data Compliance monitoring has been ad hoc historically.
Data Observability can monitor compliance by ensuring that sensitive data is adequately anonymised before sharing. Tracking the lineage of the data can help understand how data is being transformed and processed throughout the pipeline and identify potential issues that may impact compliance.
Tracking real-time compliance can reduce the burden on the governance team and minimise cost pressure on the data protection function.
For example: An Observability check can be implemented based on data lineage, specifically where data is shared externally. Any data not labelled anonymised can be tracked and, if shared, can be alerted to the relevant teams to investigate.
7. Improve Privacy Incident Process
Finding facts takes time when investigating a critical incident process.
If there is real-time visibility into potential data privacy violations, it reduces the fact-finding time significantly. Data Observability can collate facts such as data access, data anonymisation, and data breaches and improve response time.
By quickly detecting and responding to these incidents, you can minimise the impact on your business operations and prevent further data loss or privacy violations.
For example: An Observability check applied in 5 & 6 can be used to answer the fact-finding section of the incident process. All Observability alerts and how they were dealt with can be another bit of information used to learn lessons from an incident.
Conclusion
The larger the data estate, the harder it is to manage. Data Observability is removing the pain of this management by introducing newer Data Management and Science techniques.
Want to learn everything from the impacts of poor-quality data to writing a business case for a data quality initiative? Check out my FREE Ultimate Data Quality handbook. By claiming your copy, you’ll also become part of our educational community, receiving valuable insights and updates via our email list.
If you are not subscribed to Medium, consider subscribing using my referral link. It’s cheaper than Netflix and objectively a much better use of your time. If you use my link, I earn a small commission, and you get access to unlimited stories on Medium, win-win.