How to develop a CD pipeline for Airflow

Leveraging GitHub actions and Astronomer to quickly push code updates to a production environment

Chris Young
Towards Data Science

--

Photo by Mike Benna on Unsplash

Apache Airflow is a popular data orchestration tool used to manage workflows and tasks. However, one of the big questions I continue to come up against is how to deploy production-ready instances of Airflow. Options for hosting Airflow include self-management on a virtual machine, deploying to the cloud-based platform Astronomer, leveraging AWS MWAA, and more.

Of these options, I have found Astronomer to be the best value in terms of ease of use and cost per month, though I will not be diving into platform-specific comparisons in this article.

Astronomer is software/infrastructure as a service platform that enables data teams to focus more on their workflows and pipelines rather than infrastructure and scaling. In this article, I intend to demonstrate how to automate a continuous deployment (CD) pipeline to Astronomer from a local Airflow environment in less than 10 minutes.

Requirements

  1. Astro CLI (docs)
  2. Astronomer account
  3. Docker
  4. Local airflow running via astro
  5. GitHub account and repository for local airflow instance
  6. GitHub actions

The first step is initializing a local airflow instance with the astro CLI. Initialize your airflow project with astro dev init. Then, test your local environment with astro dev start.

Initialize a new Git repository in your project directory (git init) and push to GitHub.

Astronomer

Within your Astronomer account, ensure that there is a current deployment active. Create a new service account within the active deployment. The service account will be the liaison between your Github repo and your deployed instance of Airflow. Be sure to copy the API key when creating the service account, it will be your only opportunity to do so.

Photo by author of Astronomer dashboard and service account

With the astro CLI, login to your astronomer account with astro login or astro auth login. This will prompt you for your email and password, or oauth credentials. Then run astro cluster list to get your base domain (which will be referenced later). For example, gcp0001.us-east4.astronomer.io.

To verify the creation of your service account, run the following in a terminal:

export BASE_DOMAIN=gcp0001.us-east4.astronomer.io
export API_SECRET_KEY=<your_secret_key_for_service_account>
docker login registry.{BASE_DOMAIN} -u _ -p ${API_SECRET_KEY}

If you are able to authenticate through these commands, everything is on track.

GitHub

Create a new GitHub secret to hold the Astronomer service account API key. Create a new repository secret called ASTRONOMER_SERVICE_ACCOUNT_KEY and paste the API key value. The GitHub actions workflow for Astronomer will reference this secret when pushing code to production.

Next, create a new workflow action in your repository. Select Docker image by GitHub actions. This workflow builds a Docker image to deploy or push to a repository. Use this preconfigured main.yml file (below) to build your dockerized airflow instance.

Be sure to to add { your_release_name } instead of primitive-asteroid-5670. This is found in your Astronomer deployment settings.

Also note that on line 13, you will need to establish the workdir with the directory of your Dockerfile location. When the workflow runs on GitHub actions, it will compile your repository into a virtual environment and will need a reference to the Dockerfile. In my repo, the Dockerfile is located in the airflow subdirectory.

Testing

We are now ready to test! Open a PR on GitHub to merge changes into dev or main. Merge the code into the desired branch and the workflow will begin. When completed, you can check your astronomer dashboard and will see the current tag in this format: cd-commit hash.

Photo by author. Example deployment tag in Astronomer.

Your continuous delivery/deployment pipeline is now enabled for your airflow repository! For additional information/debugging help, check out these docs.

As a recap, in this article we automated the task of deploying local astro/airflow environments to production on Astronomer through GitHub actions and began the foundation for a CI/CD pipeline.

CI/CD enables teams to adopt a more “agile” framework for development. Some of my favorite objectives of CI/CD pipelines include:

  1. Efficiency — reduce time spent establishing development environments
  2. Automatic releases — avoid blockages and error related to human involvement
  3. Deploy code fast — remove bottlenecks and get code into production sooner
  4. Embrace iteration — make feedback cycles smaller and shorter with frequent changes rather than monolithic commits and releases

Check out this post to see more objectives related to Agile and CI/CD.

We implemented a simple CD pipeline (only a single production environment), but we could also leverage multiple deployments on Astronomer to quickly scale and include dev and QA environments. This would enable a more robust development and release process for your airflow environments.

--

--

Lifelong learner passionate about tech, data, cloud ops, sports, and law. Graduate of the BYU MISM program and current Data Engineer.