Create Reusable ML Modules with MLflow Projects & Docker

Because you never just train your model once.

Published in

Towards Data Science

8 min readJun 15, 2020

Let’s face it, getting a Machine Learning model into production isn’t easy. From pulling together data from disparate sources to tuning hyperparameters and evaluating model performance, there are so many different steps from sandbox to production that, if we’re not careful, can and will end up scattered across a number of interdependent notebooks or scripts that all must be run in the correct order to recreate the model. And to make matters worse, as we iterate and evaluate new versions of our model, it’s easy to neglect to document each combination of hyperparameters and each resulting metric, so that we often lose track of the lessons learned during the iterative model building process.

And remember those notebooks that we strung together to produce version 1 of the model? Well, by the time we’re finished building version 2, those notebooks have been tweaked and altered so much that we’d be hopeless to recreate version 1 if we ever needed to.

In this article, I’ll show how you can use MLflow and Docker to create ML Projects that are modular, reusable, and that allows you to easily recreate old versions of your model and tune parameters to build and evaluate new ones.

What is MLflow?

MLflow is an open-source suite of tools that help manage the ML model development lifecycle from early experimentation and discovery, all the way to registering the model in a central repository and deploying it as a REST endpoint to perform real-time inference. We won’t be covering the model registry or model deployment tools in this article. Our focus will be on MLflow Tracking, which allows us to evaluate and record model results as we quickly iterate through different versions of the model, and MLflow Projects, which we will use to package our model development workflow into a reusable, parameterized module.

What we’ll Build

In this article, we’ll use TensorFlow and the CelebA dataset to build a basic Convolutional Neural Network to predict whether or not the subject of a given image is smiling or not. We’ll create a Docker image that will act as our training environment and will contain all the dependencies required to train the model. Next, we’ll package the model training code as an MLflow Project, and finally, we’ll create a simple driver program that will kick off multiple runs of the Project asynchronously using different hyperparameter values.

Local Setup

In order to follow along, you’ll need Python 3, Docker, and MLflow installed locally. You can install MLflow using pip

pip install mlflow

Building the Model

Since our focus is on MLflow, I won’t go into great detail on the actual model, but I will briefly go over some code examples and include links to a working end-to-end project at the end of this article.

The first order of business is to load the CelebA dataset. We’ll do this using TensorFlow Datasets.

The dataset loaded from tfds doesn’t contain explicit target values. Each record is just an image and a set of attributes about the image (e.g. whether or not the person in the image is smiling, is wearing a hat, has a mustache, etc.), so the data_generator() function is used to format the data in a way that can be passed into a model. The function returns a generator that yields each record as a tuple of the following format:

(image, 1 if subject is smiling else 0)

We’ll create the training and validation datasets for the model by using tf.data.from_generator and passing in the data_generator() function.

Next, we’ll build and train a CNN using the training and validation datasets.

Creating the Project Environment

MLflow Projects allows you to define environments in three different ways: Conda, Docker Container, or Local system. We’ll be using a Docker Container for our project environment.

Here’s the Dockerfile for our simple project environment:

The requirements.txt file contains the packages needed to run the project: mlflow, tensorflow, and tensorflow-datasets. And the load_data.py script simply loads the CelebA dataset from TensorFlow and stores the results in the /app/data directory.

Note: In a real-world scenario, you would probably not store the training/validation data along with the project environment. Instead, your environment would just include the configuration and libraries needed to access the data wherever it is stored (e.g. an on-prem database or a cloud storage account). I’m only doing this here to avoid downloading the dataset from TensorFlow every time the project is run.

Packaging the Training Code

We’ll now package the model training code from earlier into an MLflow Project. An MLflow Project is simply a directory with an MLproject file that defines a few things about the project:

The environment in which the project will run. For us, this is the Docker image that we just created.
The project entry point. This will be the python script that builds and trains the model.
The parameters that can be passed into the project. We’ll define a few of these below.

The first step is to create an MLproject file. It is in this file that we reference the Docker image that will be used as the project environment. We’ll also define any parameters that can be passed into the project.

As you can see, our project will use the gnovack/celebs-cnn image as the project environment (this is the Docker image created in the previous section) and will accept a number of parameters: the batch size, number of epochs, number of convolution layers, number of training and validation samples, and a boolean indicating whether or not to perform some random transformations on the input images during training.

Next, we’ll modify the model training code to use the parameters passed in and to log training progress using MLflow Tracking. We’ll discuss MLflow Tracking shortly, but for now just know it consists of a Tracking Server, which tracks parameters and metrics during model training runs (which MLflow calls Experiment runs), and a GUI that allows us to view all of our runs and visualize the performance metrics of each run.

We can access the project input parameters using argparse like we would with any command-line arguments.

Then we can use these parameters to construct the CNN dynamically. We’ll also wrap the model training in an MLflow run using mlflow.start_run()to tell MLflow to track our model training as an MLflow Experiment run.

A few notes about the above code:

mlflow.tensorflow.autolog() enables automatic logging of parameters, metrics, and models for TensorFlow. MLflow supports automatic logging for several Machine Learning frameworks. See the full list here: https://mlflow.org/docs/latest/tracking.html#automatic-logging.
The RandomFlip layer, which randomly rotates the images during training to help prevent overfitting, is now added to the model conditionally based on the value of the randomize-images parameter.
The number of convolution layers in the model is now dependent on the value of the convolutions parameter.
A Custom Callback has been added to the model.fit call. The MLFlowCallback is a simple Keras Callback class that sends model performance metrics to the MLflow tracking server after each training epoch using mlflow.log_metrics()

Writing the Driver

With our Docker environment defined and our MLflow project created, we can now write a driver program to execute a few runs of the project asynchronously, allowing us to evaluate different combinations of hyperparameters and neural network architectures.

Before running the project, we’ll need to start the MLflow tracking server. For this example, we’ll just use your local machine as the tracking server, but in an environment where you are working on a team of engineers and/or data scientists, you would probably want to stand up a shared tracking server that’s always running for all team members to share. Some cloud services like Databricks and Azure Machine Learning even have a built-in MLflow tracking server.

To run a local tracking server and open the MLflow GUI, run the following command:

mlflow ui

We’ll use the mlflow.projects.run() method with a link to the MLflow project in Github, https://github.com/gnovack/celeb-cnn-project, to run the project (you could also use a relative file path to a local directory containing the MLflow Project). The driver script runs the project three times asynchronously with different parameters.

We specify synchronous=False so that we can execute all three runs in parallel, and backend='local' indicates that the project will execute on your local machine. MLflow also supports executing Projects on Databricks, or on a Kubernetes cluster.

After executing the driver program, go to http://localhost:5000/ to see the three active runs in the MLflow tracking UI.

By drilling into each run you can view model performance metrics, which, thanks to the MLFlowCallback we created, are updated after each training epoch, allowing us to plot these metrics over time while our model is still being trained.

MLflow Tracking metrics. *(Image by author)*

As you can see from the accuracy numbers, our model isn’t going to win any awards after just a few epochs of training with a small set of data, but you can start to see trends in the performance of the different models (e.g. the top model, which used no convolution layers, seems to be overfitting the data, as the training accuracy is rising steadily while the validation accuracy is remaining fairly static). Once each run finishes, it will output the trained model object as an artifact that can be downloaded from the tracking server, which means that, with MLflow Tracking, we not only have access to the parameters and metrics of historical training runs, but we have access to the trained models as well.

When working with more complex models and larger sets of training data, the training process can easily take several hours or days to complete, so being able to view these metrics in real-time allows us to determine which combinations of parameters might or might not yield production-quality models, and gives us the opportunity to stop runs once we start seeing trends like overfitting in the performance metrics.

Conclusion

In this article, we’ve seen how we can use MLflow projects to package the development and training of machine learning models into an encapsulated and reusable module, allowing us to train several versions of the model in parallel and compare their performance in real-time during training. MLflow also gives us the ability to keep track of the parameters, metrics, and models associated with each historical project run, which means that we can easily reproduce any previous version of the model if needed. It’s also worth noting that although we used TensorFlow exclusively in this article, MLflow Tracking and MLflow Projects work with all the major ML frameworks.

Thanks for reading! I’ll leave links to repositories containing the end-to-end project described in this article, as well as a few helpful documents I referenced while working on this. Feel free to reach out with any questions or comments.

Github Repositories

Repository containing the code needed to build the Docker environment: https://github.com/gnovack/celeb-cnn-base-image.
The resulting image is stored in Docker Hub: https://hub.docker.com/repository/docker/gnovack/celebs-cnn
The MLflow Project repository containing the MLproject file and the TensorFlow model code: https://github.com/gnovack/celeb-cnn-project
The Driver program that executes parallel runs of the MLflow Project: https://github.com/gnovack/celeb-cnn-driver