MLOps for Batch Processing: Running Airflow on GPUs

A simple workaround for Apache Airflow limitations

Mathieu Lemay
Towards Data Science

--

Photo by Tom Fisk from Pexels.

There are a lot of MLOps platforms that natively deal with GPU access, but we enjoy the simplicity of Airflow for some of our tasks. This article looks at an approach that allows for both PyTorch and Tensorflow stacks to be used.

There are already a few articles about this type of implementation, but creating your own AWS AMI for Batch or having other approaches that depended on Cloud-specific platforms are not the most elegant way of solving what seems to be a straightforward fix.

Background and Motivation

If you’ve spent a lot of time packaging up your models in a nice little container and tying them to your CI/CD pipeline, once in a while your models will require raw GPU firepower. This is especially true for large batch jobs, something that we see often with our AuditMap internal audit and enterprise risk clients. Also, we use a lot of BERT-based models for data pre-processing and cleaning for most NLP projects, so batching inference work on GPUs makes a lot of sense.

There are some out-of-the-box limitations, however, that we’ll address and resolve as part of this article.

Airflow

Apache Airflow is an open-source task scheduler and workflow manager. As opposed to end-to-end MLOps solutions, it does one thing, really well — run jobs. Airflow comes out of the box with many task types, including PythonOperator and DockerOperator. These specific operators mean you can run a Python function or spin up a container, respectively.

Under these conditions, it would be sensible to believe that the DockerOperator would be able to provide all of the native Docker functionality, including target device assignment. However, as of October 2021, there was still no major development activity on this specific capability. Now, there’s been a lot of chatter around a device_request parameter on the Airflow Github page, but neither their documentation nor their source code has this specific parameter enabled.

And I still want my team to run Airflow DAGs on GPUs, so here we are.

The Workaround

Although Airflow already uses docker-py internally, we’re going to install it as an external dependency and make calls to the Docker daemon programmatically through Python. From there, the external containers will execute their code on the target GPUs as a single-run script.

This requires an external container with a python script within it that can access the target data (more on that below). However, given a few structural best practices, we now have a container that can run as an Airflow task.

A diagram view of the call logic.

So what we’re doing is running a Python script, inside an external container, which itself is called by an Airflow Python task. This sidesteps Airflow’s DockerOperator limitations and makes the call from the native Docker engine.

Jurassic Park Mind Blown GIF By Spotify from Giphy.

Example: Basic DAG

To get our feet wet, below is a simple nvidia-smi test that will be run inside of an external container (in this case, tensorflow:2.7.0-gpu):

Getting it to work

Assuming your starting point is Airflow’s docker-compose.yaml file, here are the changes to make:

  • Add docker-py as a PIP requirement:
x-airflow-common:
...
environment:
...
_PIP_ADDITIONAL_REQUIREMENTS:
${_PIP_ADDITIONAL_REQUIREMENTS:-docker==5.0.3}
  • Mount the volume containing the Docker socket:
x-airflow-common:
...
volumes:
...
- /var/run/docker.sock:/var/run/docker.sock

(Note: If you’re eventually using the DockerOperator, then this can be included as a parameter.)

From there, given proper tunneling and no port collision, you can run docker-compose up to get airflow running. The main web server is now running on port 8080. If you’ve taken the code from our Github repo, then two new DAGs will appear:

The two additional DAGs as part of our repo.

Running the gpu_test DAG gives us a lot of confidence.

Results

As expected, we have a nice DAG run success and a completion log with an nvidia-smi printout:

The successful DAG run graph. Beautiful.
The nvidia-smi logged from the check_gpu() call.

Advanced Scripting

The next step is to create a container that can:

  1. hold the production models;
  2. run the inference code; and
  3. load the target data and save the results.

If you start off with a containerized design, then it becomes easier to assign all of the inference tasks to a single script. Let’s create a simple, multi-task inference script that can take parameters. We’ll create a simple translation pipeline from Helsinki-NLP. We’ll also use some sample data from Kaggle to jumpstart the testing process.

This script can now execute an inference task with parameters sent to it. Let’s update our DAG to make the call:

Results

Success!

Another successful Airflow run.
The log confirming the successful run.

And the GPU indeed gets hit during this operation:

The nvidia-smi printout.

Next Steps

There are a few infrastructure items that you’ll need to clear up before running this successfully:

  1. Map the data folders and data sources.
  2. Load the correct models.
  3. Ensure that nvidia-docker and GPU access are available on your target machine.

You should also add the models programmatically as part of your CI/CD pipeline with MLFlow, PMML, or any other high-quality tools.

Discussion

Although a bit convoluted, this design pattern allows for a lot of interchangeability between the various inference activities. It also allows for the models to be segregated as part of the CI/CD/MLOps pipeline, as the containers will always take the latest models when starting (if the models are loaded from an external volume).

There is a bit of vigilance required to prevent GPU resource contention, but our experience with BERT models has taught us that you can comfortably run multiple models on a 6–8 GB memory budget per model. (Make sure to set incremental memory growth for Tensorflow, however.)

Notes

The inspiration for this article came from Aquater on StackOverflow. Good job Aquater.

Happy Pipelining!

-Matt.

If you have additional questions about this article or our AI consulting framework, feel free to reach out by LinkedIn or by email.

Other articles you may enjoy

--

--

Matt Lemay, P.Eng (matt@lemay.ai) is the co-founder of lemay.ai, an international enterprise AI consultancy, and of AuditMap.ai, an internal audit platform.