Hands-on Tutorials

Prefect: How to Write and Schedule Your First ETL Pipeline with Python

Workflow management systems made easy — both locally and in the cloud

Dario Radečić
Towards Data Science
10 min readJul 22, 2021

--

Photo by Helena Lopes from Pexels

Prefect is a Python-based workflow management system based on a simple premise — Your code probably works, but sometimes it doesn’t (source). No one thinks about workflow systems when everything works as expected. But when things go south, Prefect will guarantee your code fails successfully.

As a workflow management system, Prefect makes it easy to add logging, retries, dynamic mapping, caching, failure notifications, and more to your data pipelines. It is invisible when you don’t need it — when everything works as expected, and visible when you do. Something like insurance.

While Prefect isn’t the only available workflow management system for Python users, it is undoubtedly the most proficient one. Alternatives such as Apache Airflow usually work well, but introduce a lot of headaches when working on big projects. You can read a detailed comparison between Prefect and Airflow here.

This article covers the basics of the library, such as tasks, flows, parameters, failures, and schedules, and also explains how to set up the environment both locally and in the cloud. We’ll use Saturn Cloud for that part, as it makes the configuration effortless. It is a cloud platform made by data scientists, so most of the heavy lifting is done for you.

Saturn Cloud can handle Prefect workflows without breaking a sweat. It is also a cutting-edge solution for anything from dashboards to distributed machine learning, deep learning, and GPU training.

Today you’ll learn how to:

  • Install Prefect locally
  • Write a simple ETL pipeline with Python
  • Use Prefect to declare tasks, flows, parameters, schedules and handle failures
  • Run Prefect in Saturn Cloud

How to Install Prefect Locally

We’ll install the Prefect library inside a virtual environment. The following commands will create and activate the environment named prefect_env through Anaconda, based on Python 3.8:

--

--