Extended use of Metadata in modern data landscape and benefits

Tejasvi Addagada
Towards Data Science
6 min readJan 25, 2021

--

The enterprises are modernizing their data platforms, and associated tool-sets to serve the fast-needs of the data practitioners, including data scientists, data analysts, business intelligence and reporting analysts, and self-service embracing business and technology personnel.

However, as the tool-stack in most organizations is getting modernized, so is the variety of metadata generated. As the volume of data is increasing every day, thereupon the metadata associated with data is expanding as is the need to manage it.

https://www.tejasviaddagada.com/
A generalized data landscape. Courtesy: Tejasvi Addagada

The first thought that strikes us when we look at a data landscape and hear about a catalog, “It scans any database ranging from a Relational to a NoSQL or Graph and gives out useful information.”

  • Name
  • Modeled data-type
  • Inferred data types
  • Patterns of data
  • Length with a minimum and largest threshold
  • Minimal and Maximum Values
  • Other profiling characteristics of data like Frequency of values and their distribution

What are the basic benefits of Metadata managed in catalogs?

  1. Increased availability of intelligence about data that brings out better context to insights
  2. The reduced turnaround time to find answers during the analysis
  3. Increased efficiency of subject-matter-experts in turning out information for impact analysis
  4. Removes ambiguity in relationships among data in the landscape
  5. Simplifies the views of data through meaning, identified redundancy, and relationships
https://www.tejasviaddagada.com/
Types of metadata, Courtesy: Tejasvi Addagada

The uses of metadata have developed multi-folded over the earlier year, attributed to technological advancements and public policy changes. Most enterprises are using catalogs for several use-cases as the ones listed.

  1. Data Discovery — Associated with the doctrine of data-democratization
  • answers questions such as “where does data exist physically in schemas as objects and instances as elements
  • Searching for data in single or multiple application systems-of-records, systems of reference like lakes or warehouses

2. System privacy profiling — Related to the convention of data protection and privacy management

  • Identifying factors that are private to data subjects even though modeling names might not relate
  • Help know risk categorization of applications, including their logs in SOC operations.

3. Controlling access to data — analogous to the principle of data security.

  • Identifying data entitlements and handling them in a single repository
  • Managing user groups, users, data access policies, owners who can grant/revoke access to data

4. Data administration — Connected with the aspects of managing data and governing it

  • Curating and identifying processes related to data creation and processing
  • People information like owners of data, business, process, and personnel stewarding data
  • Finding commonality in ownership of data across an organization to manage context

5. Meaning — associated with the principle of “interpreting data in the right sense”.

  • Definitions of what data means to a specific situation and person
  • Collecting and finding singular and common contextual description based on the application of data in processes

6. Usage — corresponding to the principle of Inter-operability of data within a firm and beyond

  • Means of usage including reports, dashboards, artificial intelligence models
  • Frequency of usage, vintage of artifacts using the specific data

7. Classifying data for better management — correlated to principles of availability of data

  • The pace of change of data & applying it — Master, Reference, Transaction data
  • Privacy classifications — private, sensitive, special category, behavioral data
  • Labels — National identifiers, address, names, card-related data, health information
  • Transformation classifications — native, derived, or transformed data

8. Canonical management — Logical groups, names, canonical modeling attribute names, other standard modeling names, class associations in BIAN, MISMO, etc

9. Rules operations — related to principles of interoperability & coverage

  • an integral part of business metadata often ignored in operational metadata processes, orchestrated for operations
  • classifying rules better through business rules, policy enforcement rules, derivation & transformation rules, data quality rules, rule execution statistics
  • Maintaining business rules is an excellent enabler in performing an Impact analysis, Data analysis, and need analysis
  • managing relationships between data gets better at finding the rules that data is a party to.
https://www.tejasviaddagada.com/
Types of rules managed in a catalog, Courtesy: Tejasvi Addagada

10. Data Operations — Extends the principle of data distribution management.

  • Assists understand data usage, derived/native characteristics, vintage, last used, Pipelines, Archival and destruction policies, Partitions, Jobs, schedules.

How does Governance enable Metadata?

Metadata management also requires analysts to put information into a catalog at the right stage of the change. It can be done by including the right stakeholders, consistently, through the lifecycle of data. Data as well has a life-cycle, POSMAD, (Plan, Obtain, Store/Share, Maintain, Apply, Decay) that helps bring out the lineage.

Even enriched agile management models like scrum, kanban, DAD, FDD can benefit from curating and using the institutional knowledge on data, in projects, for accelerated delivery of features. Data governance can enable a balance of hosting and serving of metadata guaranteeing that metadata works for most use-cases.

• where it comes from?

• which processes it applies to?

• who uses the Business term?

• which system leverage the data element — storing, sharing, transforming, and decaying

As governance formalizes active management of metadata through a specific operating rhythm and processes, it becomes much easier to integrate it into project life-cycles planning for data changes or usage. A data governance function provides a leeway to put guide-rails or guard-rails that helps assess, direct, and monitor the management of metadata to assist goals of managing data and metadata in an organization.

Moreover, managing metadata requires a standard framework that can channelize personnel to facilitate the capture of information associated with data.

Some questions that are commonly provoked while managing catalogs

  1. Have you democratized the catalog for any personnel in the organization to put in information they know about data? Are there identified data stewards who can give direction in baselining information the organization can use that?
  2. Is metadata from sources fueling the management of schema-drifts in data-lakes and warehouses? How often should metadata from sources be scanned or pushed into the catalog?
  3. Are you looking at Business terms used in vivid contexts with specialized names, but are you capturing the synonyms or common names, to ease the usage, globally?
  4. Is Metadata Management bridging the gap between regional, global operations, and IT? Business Analysis is poised to enable this communication, but Metadata Management comes with enablers that can push this aspect at a sped-up pace.
  5. Do you have a forum today where all the relevant stakeholders can bring a common understanding or enrichment of what they know of data before publishing?
  6. How are you planning to help the business self-service with the excellent information that has been captured in the metadata repository? Is the catalog too technical to accept?
  7. Is Compliance taking away your greatest effort with little direction to find out why you are doing this in the first place?
  8. Does your meta-model take into consideration the various use-cases — considerations like how many data owners can a business term have, how many can be contributors and viewers, we can define how many systems of truths?

Business metadata does not generate information on its own and requires every responsible stakeholder, system, and process to consistently produce Metadata including definitions, and administrative aspects. The focus of vendors in this space is to maximize the automated pull of any information from the databases and systems. However, metadata if actively managed enables organizations to better govern data.

A catalog can be a means of enriching collaboration between data personnel on data operations. Further reading on using a catalog as a collaborative tool — https://www.dattamza.org/dattamza-blog/data-catalog-as-a-single-means-to-collaborate-through-data-operations

Follow me on LinkedIn and Twitter or on my website

--

--

Tejasvi Addagada is a data strategist and consultant assisting fortune 500 firms. He helps to build and optimize data management and governance solutions.