Member-only story
Managing dependencies between data pipelines in Apache Airflow & Prefect
A simple approach to managing dependencies between your workflows
If you ever built data pipelines for co-dependent business processes, you might have noticed that incorporating all of your company’s business logic into one single workflow does not work well and quickly turns into a maintenance nightmare. Many workflow scheduling systems let us manage dependencies within a single data pipeline but they don’t support us in managing dependencies between workflows.
A natural way of resolving this problem would be to split a large pipeline into many smaller ones and coordinate the dependencies between them in some parent-child relationship. However, there are many possible ways of addressing this problem and I want to share one simple approach that worked well for me. I hope it might help you to manage dependencies between your data pipelines.
How Airflow community tried to tackle this problem
Within the book about Apache Airflow [1] created by two data engineers from GoDataDriven, there is a chapter on managing dependencies. This is how they summarized the issue:
“Airflow manages dependencies between tasks within one single DAG, however it does not…