The world’s leading publication for data science, AI, and ML professionals.

4 Revolutionary Ways to Improve Your Data Governance Team

Enhance Your Data Governance Practices with These 4 Transformative Approaches

Without effective Governance, organisations will find themselves wandering through a Digital Wild West.

The recent advances in the EU Artificial Intelligence Act have also noted Data Governance as a must-have for organisations dealing with High-Risk AI Systems to stay compliant. Despite the increasing scale of data, organisations have regressed instead of enhancing governance efforts, leading to weaker Data Governance practices.

Let’s look at four ways in which, using modern techniques, you can improve Data Governance for your organisation.


1. Replace Data Stewards with a Co-Pilot

In my many years of implementing Data Governance for clients, I am yet to meet a happy Data Steward.

They have the impossible task of translating the data into business and vice-versa, with little authority to enforce compliance. The reasons for failure usually include poorly designed Governance models, lack of senior management buy-in or seeing Governance as a one-off project with minimal investment.

A Co-Pilot using LLM can create policy documents, capture business and technical metadata, verify data creation is according to agreed standards etc. Even if you can’t invest in a new tool, a Co-Pilot can be a Slack/Teams Bot, linked to your organisation’s metadata tool, simply answering "FAQs" such as who owns the data, which data is the master, what’s the definition of a particular column etc. for the end user.

How does this impact a data team?

Data Leaders can find cost savings with a reduction in full-time resources. Data Engineers can ensure enough metadata is documented and fed to the Co-Pilot to stop the queries from end users coming their way. Data Analysts/Scientists can rely on the Co-Pilot to answer enough questions to help with their analytical queries.


2. Reduce Forums and Committees and Add Intelligence in Decision Making

Forums are never 100% productive; there are always individuals attending for the sake of saying "thanks, bye!" at the end of the call.

A weekly data working group & metadata forum, a two-weekly data quality forum and a monthly data committee meeting. Although started with good intentions, most of these forums tend to have similar audiences, repeating the same or similar messages.

A decision matrix and intelligent workflow automation should be able to handle over 90% of incoming Data Governance requests. For example, if a request is for personal location data access, the end business use case is demographic analytics. The governance tool can use a matrix like the one below to spit out the answer.

How does this impact a data team?

Data Teams can spend less time in forums and focus efforts on strategic decision-making. Data Analysts/Scientists can rely on the intelligent decision output from the governance tool (that could be a Slack/Teams Bot, too) to understand whether the data is accessible.


3. Use Semantic Embedding to Intelligently Classify and Tag Data

Classifying and then tagging data adds much-needed context to Data Governance decision-making.

Historically, classifying data has been one of the most challenging jobs. It was either resource intensive, as someone had to manually look through data assets and determine their classification or technology-intensive, where data would have to be constantly scanned for changes and updates.

Semantic Embedding is revolutionising context. For each data asset, a contextual embedding can be created in the form of Vectors. Using similarity scores, these vectors can be ranked for their classification. For example, data related to a customer, their address, date of birth etc., can be grouped as "personal data" using a vector similarity search. This will also work for unstructured data like documents.

How does this impact a data team?

Data Governance teams can spend less time manually classifying and tagging data. Intelligent workflows can be built around these classifications of data to provide decision support, such as "can the data be accessed?", "is the data personal or sensitive?". This information can be surfaced to the Data Analyst/Scientist using Co-Pilots as discussed in 1 and used in decision making as discussed in 2.


4. Data Product Marketplace to Catalogue, Discover & Access Data

Data Product Marketplace can be your one-stop shop for all your data needs.

We are disparate in data, one tool for Engineering, another for Management, another for Governance and yet another for Privacy. Organisations often need to improve at integrating Data Governance across their estate, especially if it is agile and constantly changing.

The Marketplace is your area for cataloguing and then discovering data, as it will be linked to all your disparate data assets. And the culmination of all the above techniques can be brought centrally into a Data Product Marketplace. For example, a table with customer information will be catalogued in the Marketplace, classified and tagged using Semantic Embedding (3), accessed using an intelligent governance workflow (2), and contextualised by a Co-Pilot (1).

How does this impact a data team?

Data Scientists can easily find the data in the Marketplace and whether the data can be used using the Co-Pilot. Data Engineers can reuse the Marketplace data instead of creating more duplication. Data Analysts can use the profile of the data in the Marketplace to determine whether it’s good enough for their use case. Data Leaders can rely on a single platform for their governance and reporting needs.


Conclusion

The goal of a Modern Data Governance function is to be less human-dependent and be closer to the end-users as a helping hand rather than a hindrance & red tape.

Although I have proposed many new AI techniques to capitalise on this, one thing will be forever important; getting the foundations right. If your data is poor, you don’t need a Co-Pilot to regurgitate that message.

Getting Data Quality right is one of the hardest things to achieve, but consequently, the most significant ROI. Check out my FREE Ultimate Data Quality Handbook to help you get your foundations right, and join my email list.

Ultimate Data Quality Handbook – FREE!


Related Articles