Managing dependencies between data pipelines in Apache Airflow & Prefect

A simple approach to managing dependencies between your workflows

Anna Geller
Towards Data Science
7 min readSep 4, 2020

--

Photo by Kelly Sikkema on Unsplash

If you ever built data pipelines for co-dependent business processes, you might have noticed that incorporating all of your company’s business logic into one single workflow does not work well and quickly turns into a maintenance nightmare. Many workflow scheduling systems let us manage dependencies within a single data pipeline but they don’t support us in managing dependencies between workflows.

A natural way of resolving this problem would be to split a large pipeline into many smaller ones and coordinate the dependencies between them in some parent-child relationship. However, there are many possible ways of addressing this problem and I want to share one simple approach that worked well for me. I hope it might help you to manage dependencies between your data pipelines.

How Airflow community tried to tackle this problem

Within the book about Apache Airflow [1] created by two data engineers from GoDataDriven, there is a chapter on managing dependencies. This is how they summarized the issue:

“Airflow manages dependencies between tasks within one single DAG, however it does not

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Anna Geller
Anna Geller

Written by Anna Geller

Data Engineering, AWS Cloud, Serverless & .py. Get my articles via email https://annageller.medium.com/subscribe YouTube: https://www.youtube.com/@anna__geller

Responses (1)

What are your thoughts?