Data Observability: How to Fix Data Quality at Scale

Introducing a new approach to preventing broken analytics dashboards and increasing trust in your data.

Published in

Towards Data Science

6 min readJan 13, 2021

Companies spend upwards of $15 million annually tackling data downtime, in other words, periods of time where data is missing, broken, or otherwise erroneous, and 1 in 5 companies have lost a customer due to incomplete or inaccurate data.

Fortunately, there’s hope in the next frontier of data: observability. Here’s how data engineers and BI analysts at Yotpo, a global eCommerce company, increases cost savings, collaboration, and productivity with data observability at scale.

Yotpo works with eCommerce companies across the world to help them accelerate online revenue growth through reviews, visual marketing, loyalty and referral programs, and SMS marketing.

For Yoav Kamin, Director of Business Performance, and Doron Porat, Data Engineering Team Leader, having consistently accurate and reliable data is foundational to the success of this mission.

The challenge: broken data pipelines & dashboards

Since day one, Yotpo has invested in a distributed data platform for internal teams that support the company’s diverse data needs, from generating marketing reports to empowering product development teams to build better services for their users.

Over the past few years, Yotpo has grown exponentially, expanding their operations globally and acquiring companies including Swell Rewards and SMSBump. As Yotpo grew, so too did the number of data sources and complexity of their data pipelines. Over time, it became harder to keep track of data completeness, lineage, and quality, three critical features of reliable data. This data downtime, in other words, periods when data is missing, inaccurate, or otherwise erroneous, led to time-intensive and costly data fire drills that caused friction between Yotpo’s Business Performance and Data Engineering teams.

“Time and again, our staff would approach my team and tell us the data is wrong, but we had no idea how the data broke in the first place,” said Doron. “It was clear to us that we had to gain better control over our data pipelines, as we can’t have our data consumers alerting us on data issues and keep getting caught by surprise , no one would be able trust our analytics this way.”

To tackle this problem, Yotpo needed a better way to manage data health and discovery. At the end of the day, they required a solution that would empower them with the right information at the right time to alert for and prevent abnormalities in their data pipelines and business intelligence dashboards, before they impacted the business.

The solution: data observability

To help them eliminate data downtime and unlock the potential of their data, Yotpo chose chose a proactive approach to data observability that would automatically monitor across key features of Yotpo’s data ecosystem, including data freshness, distribution, volume, schema, and lineage.

Without the need for manual threshold setting, their team received quick answers such questions as:

When was my table last updated?
Is my data within an accepted range?
Is my data complete? Did 2,000 rows suddenly turn into 50?
Who has access to our marketing tables and made changes to them?
Where did my data break? Which tables or reports were affected?

Outcome: Cost savings by catching data anomalies before they impact downstream consumers

Out-of-the-box, data observability gave Yotpo an overview of their Redshift environment, including all data assets, schemas, and tables. Their machine learning algorithms automatically generated rules to inform data downtime monitoring and alerting, providing immediate value.

ML algorithms tuned to Yotpo’s data ecosystem detected data anomalies before they impacted downstream data assets. Image courtesy of Barr Moses.

A recent example of data observability’s impact for the team was when an erroneous data point in Yotpo’s Segment instance generated 6x more rows than expected, even with seasonality and normal data fluctuations taken into account.

“When the spike happened, our data observability engine alerted my team immediately, allowing us to investigate and troubleshoot the anomaly before it affected downstream data consumers on the Marketing team,” said Doron. “Since we caught this right away, I was able to rest assured that no important business metrics would be impacted.”

Since the Data Engineering team was notified of the issue before it affected their stakeholders, they were able to fix their pipeline and prevent future anomalies from jeopardizing the integrity of their data.

Outcome: Improved collaboration by tracing field-level lineage

Another use case where data observability became crucial to Yotpo’s day-to-day operations was when Yotpo’s Business Applications team, the group responsible for integrating and maintaining internal operational systems such as Salesforce, wanted to replace an outdated field with a new one. Many of their dashboards heavily relied on this field so they had to prepare in advance for this change.

Data observability enabled Yotpo to understand dependencies for critical data assets. Image courtesy of Barr Moses.

Using information-rich query logs, data observability helped Yotpo understand 1) what are the downstream dependencies for this field, 2) who is using the table and how 3) which dashboards are currently using the old field (lineage) and 4) where the new field had been added to track the progress of this update.

“We use lineage to highlight upstream and downstream dependencies in our data ecosystem, including Salesforce, to give us a better understanding of our data health,” said Yoav. “Instead of being reactive and fixing the dashboard after it breaks, we have visibility that we need to be proactive.”

Outcome: Increasing productivity by keeping tabs on deprecated data sets

Data observability gives Yotpo greater transparency into the health of important data sets and tables. Image courtesy of Barr Moses.

Additionally, data observability gives Yotpo greater transparency into the relevancy and usage patterns of important data assets, informing them when different attributes (such as record type ID and specific data sets) are deprecated. Data observability lets them track where new fields are being added and used so you can keep tabs on what dashboards need to be updated. This knowledge ensures Yotpo can trust their data to be accurate and reliable, even as their data platform evolves.

“Once you lose trust in your data, you lose reliability,” said Doron. “With data observability, there’s less data downtime and more data reliability. Long gone are the days of painful schema changes and broken dashboards.”

Impact of Data Observability at Yotpo

For Yotpo, data observability empowered them to solve data quality issues fast so they could start trusting their data to deliver reliable, actionable insights for the business.

“Our execs rely on my team’s dashboards to make decisions. With data observability, we know exactly what to update when there’s a change in our data, so there’s no downtime and no fire drills. Our decision makers are happier and I can sleep at night,” said Yoav.

Among other benefits, data observability has enabled Yotpo to:

Increase cost savings by reducing time to resolution of tedious data fire drills and restore trust in data for vital decision making
Better collaborate between data engineering and data analyst teams to understand key dependencies between data assets
Drive greater efficiency and productivity by gaining end-to-end visibility into the health, usage patterns, and relevancy of data assets

With Data Observability in tow, Yoav, Doron, and their data team are well-prepared to eliminate data downtime and continue unlocking the potential of their data.

Interested in learning more data observability? Reach out to Barr Moses.

This article was co-written by Will Robins. Special thanks to Yoav, Doron, and the rest of the Yotpo data team!