
Introduction
Apache Airflow 2 is finally out. Among the new features announced (a brand new scheduler, stable REST APIs and much more) a production-ready Dockerfile has been released. This allows developers to deploy Airflow using modern cloud technologies rather than installing it on bare metal.
In this article we will take a look at how to create a local environment using VirtualBox and Vagrant, install Docker and deploy a production-ready Airflow container. Let’s start!
Local Environment
In this section we will see how to create a virtual machine, install Docker and configure a shared folder to easily access local files in the guest machine. Feel free to skip this part if you don’t want to use a VM for your experiments.
Configure a virtual machine
Assuming you have already installed VirtualBox and Vagrant on your local machine, this is the manifest (Vagrantfile) that represents the virtual machine we want to build:
This manifest specifies how "airflow-vm" (hostname) __ must looks like:
- based on ubuntu/xenial64 image
- 4GB of RAM
- uses IP 192.168.1.200
- mounts shared folder (must be created in the same path as Vagrantfile) under /opt/airflow
Open a new terminal window, move to the Vagrantfile path and simply type:
vagrant up
vagrant ssh
Install Docker
Now it’s time to install Docker on this brand new VM. We will use the official docker-install script by typing:
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh
If everything went ok we should be able to run the following command:
sudo docker run hello-world
Airflow 2 on Docker container
In this section we will start from the official Docker image (v2.0 stable) adding a thin layer on top of it to automatically create an admin account and to easily configure an external Postgres database in order to enable parallelism through Local Executor.
The approach
Let’s start from the end (yes, Nolan changed my mind with Tenet). Our goal is to create a stack composed by Airflow 2 (web server and scheduler can be deployed separately, but for this article an all-in-one solution might be more appropriate) and a Postgres database. We will use the following docker-compose file:
Here we are defining 2 services: postgres (our db) and server (airflow).
Let’s break it down!
The approach – Postgres database

postgres:
image: postgres:12-alpine
env_file:
- postgres.env
volumes:
- postgres:/data/postgres
- ./scripts/postgres/:/docker-entrypoint-initdb.d/
Postgres (v12 based on alpine) is initialized with an external environment file:
POSTGRES_USER=airflow
POSTGRES_PASSWORD=<db_pass_here>
POSTGRES_DB=airflow
PGDATA=/data/postgres
and a 00_init.sql file (inside a directory scripts/postgres) that simply creates a table:
CREATE TABLE airflow__extra_conf(
conf_name VARCHAR (255) PRIMARY KEY,
conf_value VARCHAR (255) NOT NULL
);
Postgres data are persisted by creating a volume mapped to /data/postgres.
The approach – Airflow server

server:
image: airflow2-docker:1.0.0 # build this first
env_file:
- airflow.env
- airflow_db.env
ports:
- "8080:8080"
volumes:
- ./dags:/opt/airflow/dags
Airflow server is based on a custom docker image (which will be described in the next section) based on the official 2.0 stable version. We use two environment files: airflow.env (Airflow configuration) and _airflowdb.env (database configuration).
Here it is a minimal airflow.env that you can extend based on your needs:
# -- CORE
AIRFLOW__CORE__EXECUTOR=LocalExecutor
AIRFLOW__CORE__LOAD_EXAMPLES=False
# -- WEBSERVER
AIRFLOW__WEBSERVER_BASE_URL=http://192.168.1.200 # here your VM IP
# -- SCHEDULER
AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL=60
# -- ADMIN
SECURITY__ADMIN_USERNAME=<admin_username>
SECURITY__ADMIN_FIRSTNAME=<admin_firstname>
SECURITY__ADMIN_LASTNAME=<admin_lastname>
SECURITY__ADMIN_EMAIL=<admin_email>
SECURITY__ADMIN_PASSWORD=<admin_password>
Note that since we don’t want to use the internal SQLight database we have specified LocalExecutor as Core Executor. Local Executor will also allow task parallelism in our DAGs .
_airflowdb.env contains the external db information we have set in the previous step:
# -- DB
DB__HOST=airflow_postgres #docker stack name + _postgres
DB__PORT=5432
DB__USERNAME=airflow
DB__PASSWORD=<db_pass_here>
DB__NAME=airflow
We will mount also a dags folder where we will place our DAG files.
Airflow 2 Dockerfile
Let’s see how the custom Airflow image looks like:
FROM apache/airflow:2.0.0-python3.7
USER root
# INSTALL TOOLS
RUN apt-get update
&& apt-get -y install libaio-dev
&& apt-get install postgresql-client
RUN mkdir extra
USER airflow
# COPY SQL SCRIPT
COPY scripts/airflow/check_init.sql ./extra/check_init.sql
COPY scripts/airflow/set_init.sql ./extra/set_init.sql
# ENTRYPOINT SCRIPT
COPY scripts/airflow/init.sh ./init.sh
ENTRYPOINT ["./init.sh"]
This basically installs some dependencies ( we will see why postgresql-client is needed in a moment), adds two SQL scripts and runs a provided SH script. Let’s see what they do starting from the end (again, thanks Nolan):
Once the db is available the script checks if an admin user exists, how? Do you remember the _airflow__extraconf table? it’s used to store a flag that determines the correct server initialization.
Let’s take a look at _checkinit.sql:
SELECT count(conf_name) FROM airflow__extra_conf WHERE conf_name='IS_INITIALIZED';
if there isn’t a _ISINITIALIZED config the SH script creates the admin account and then sets the config using the _setinit.sql:
INSERT INTO airflow__extra_conf VALUES('IS_INITIALIZED','1');
In this way the admin user can be created only once without relying on alternative solutions (like persisting files in volumes).
Final project structure should looks like the following:
/-- Vagrantfile
|-- shared
|-- dags
|-- scripts
|-- airflow
-- check_init.sql
-- set_init.sql
-- init.sh
|-- postgres
-- 00_init.sql
-- Dockerfile
-- docker-compose.yaml
-- airflow.env
-- airflow_db.env
-- postgres.env
Ready to deploy
Assuming you have changed the environment variable files by changing IP, db password and admin credentials, you are ready to deploy Airflow using Docker.
cd /opt/airflow
sudo docker stack deploy -c docker-compose.yaml airflow
Airflow server logs can be seen using:
sudo docker service logs -f airflow_server
Once you see the message "Creating admin user.." and the Airflow logo you are ready to log-in using the UI.

Open your browser and go to 192.168.1.200:8080 . Enter your admin credential and Voilà!
What’s next?
Now you are able to upload your DAGs in the dags folder and run them through the UI.
In the next article we will se how to setup a Docker Swarm cluster and distribute loads among it using DockerSwarmOperator.