Data Contracts — ensure robustness in your data mesh architecture

Piethein Strengholt
Towards Data Science
7 min readMar 17, 2022

In a federated architecture, in which responsibilities are distributed between domains, it’s harder to oversee dependencies and obtain insights in data usage. This is where data contracts come into play. Why do data contracts matter? Because they provide insights into who owns what data products. They support setting standards and managing your data pipelines with confidence. They provide information on what data products are being consumed, by whom and for what purpose. Bottom line: data contracts are essential for robust data management!

Before you continue reading, I encourage you to look at data product distribution and usage from two dimensions. First, there are technical concerns, such as data pipeline handling and mutual expectations on data stability. Second, there are business concerns, like agreeing on the purpose of data sharing, which may include usage, privacy, and purpose (including limitations) objectives. Typically, different roles come into play for each dimension. For technical concerns, you commonly rely on application owners or data engineers. For business concerns, you commonly rely on product owners or business representatives.

Data Contracts

Data contracts are like data delivery contracts or service contracts. They’re important because when data products become popular and widely used, you need to implement versioning and manage compatibility. This is needed, because in a larger or distributed architecture it’s harder…

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Written by Piethein Strengholt

Hands-on Chief Data Officer. Working @Microsoft.

Responses (3)

What are your thoughts?

How would this compare to consumer driven contracts and the pact framework?

--

For automating our governance processes, we've been repurposing the mechanisms and lessons from our CI/CD/SCM systems. For each data domain, there is a git location and data pipeline. Domain-level changes like the following go through this pipeline:
1…

--

Piethein, I wanted to share that PayPal released its template of a data contract in Open Source, following an Apache 2 license. You can find it here: https://github.com/paypal/data-contract-template

--