Definitive tutorial for advanced use of MLflow

Introduction
MLflow is a powerful tool that is often talked about for its experiment tracking capabilities. And it’s easy to see why – it’s a user-friendly platform for logging all the important details of your Machine Learning experiments, from hyper-parameters to models. But did you know that MLflow has more to offer than just experiment tracking? This versatile framework also includes features such as MLflow Projects, the Model Registry, and built-in deployment options. In this post, we’ll explore how to utilise all of these features to create a complete and efficient ML pipeline.
For complete MLflow beginners, this tutorial might be too much so I highly encourage you to watch these two videos before diving into this one!
Setup
For this project we’ll be working locally, so make sure to properly setup your local environment. There are three main dependecnies that the project requires – Mlflow
,[pyenv](https://github.com/pyenv/pyenv#installation)
, and [kaggle](https://github.com/Kaggle/kaggle-api)
. While MLflow can be installed simply using pip, you’ll need to follow separate instructions to setup pyenv and kaggle.
Once you’re done with the installations, make sure to pull the latest version of this repo. When you have the repo on your laptop, we’re finally ready to begin!
Project Overview
Move to the mlflow_models
folder and you’ll see the following structure:

Here’s a brief overview of each file in this project:
MLProject
– yaml-styled file describing the MLflow Projectpython_env.yaml
— lists all the environment dependencies to run the projecttrain_hgbt.py
andtrain_rf.py
– training scripts for HistGradientBoosterTree and RandomForest models using specific hyperparamaterssearch_params.py
– script to perform hyperparameter search.utils
– folder contains all the utility functions used in the project
As stated before, this project is end-to-end, so we’re going to go from data download to model deployment. Approximate workflow is going to be like this:
- Download data from Kaggle
- Load and pre-process the data
- Tune Random Forest (RF) model for 10 iterations
- Register the best RF model and put it into production bucket
- Deploy the models using in-built REST API
After we’re done, you can repeat the steps 2 to 5 for HistGradientBoostedTrees model on your own. Before jumping into the project, let’s see how these steps can be supported by MLflow.
MLflow Components
Generally speaking, MLflow has 4 components – Tracking, Projects, Models, and Registry.

Thinking back to the project steps, here’s how we’re going to be using each of them. First of all, I’ve used MLflow Projects to package up the code so that you or any other data scientist/engineer can reproduce the results. Second, MLflow Tracking serve is going to track your tuning experiments. This way, you’ll be able to retrieve the best experiments in the next step, where you’ll add your models to the Model Registry. From the registry, deploying the models will literally be a one-liner because of the MLflow Models format that they’re saved in and their in-built REST API functionality.

Pipeline Overview
Data
Data will be downloaded automatically when you run the pipeline. As an illustrative example, I’ll be using a Loan Default dataset (CC0: Public Domain license) but you can adjust this by re-writing training_data
parameter and changing column names to the relevant ones.
MLProject & Environment Files
MLProject
file gives you a convenient way to manage and organise your machine learning projects by allowing you to specify important details such as the project name, location of your Python environment, and the entry points for your pipeline. Each entry point can be customized with a unique name, relevant parameters, and a specific command. The command serves as the executable shell line that will be executed whenever the entry point is activated, and it has the capability to utilize the parameters that have been previously defined.
python_env.yaml
file outlines the precise version of Python necessary to execute the pipeline, along with a comprehensive list of all required packages.
These two files were needed to create a necessary environment for running the project. Now, let’s see the actual scripts (entry points) that the pipeline will be executing.
Training and Experiment Tracking
Training is done in train_rf.py
and train_hgbt.py
scripts. Both of them are largely the same, with exception of the hyper-parametrs that get passed and the pre-processing pipelines. Consider the function below which downloads the data and trains a Random Forest model.
The experiment starts when we define MLflow context using with mlflow.start_run()
. Under this context, we use mlflow.log_metrics
to save the PR AUC metrics (check out the eval_and_log_metrics
function for more information) and mlflow.sklearn.log_model
to save the preprocessing and modelling pipeline. This way, when we load the pipeline, it will do all the pre-processing together with the inference. Quite convenient if you ask me!
Hyper-parameter Tuning
Hyper-parameter tuning is done using Hyperopt package in search_params.py
. A lot of the code is borrowed from the official mlflow repo but I’ve tried to simplify it quite a bit. The trickiest part of this script is to understand how to structure these tuning rounds, so that they appear connected to the "main" project run. Essentially, when we run search_params.py
using MLflow, we want to make sure that the structure of experiments is as follows:

As you can see, the search_params
script does nothing else but specifies which parameters should train_rf.py
use next (e.g. depths of 10, 2 and 5) and what should be its parent run ID (in the example above it’s 1234). When you explore the script, make sure to pay attention to the following details.
- When we define
mlflow.start_run
context, we need to make sure thatnested
parameter is set toTrue
- When we run
train_rf.py
(ortrain_hgbt.py
), we explicitly pass therun_id
and make it equal to the previously createdchild_run
run - We also need to pass the correct
experiment_id
Please see the example below to understand how it all works in code. eval
funtion is the one that will be optimised by the Hyperopt minimisation function.
The actual tuning function is relatively simple. All we do is initialise an MLflow experiment run (parent run of all the other runs) and optimise the objective function using provided search space.
Please note that these functions are just to illustrate the main parts of the code. Refer to the github repository for the full versions of the code.
Run the RF Pipeline
By now you should have a general idea about how the scripts work! So, let’s run the pipeline for Random Forest using this line:
mlflow run -e search_params --experiment-name loan . -P model_type=rf
Let’s decompose this command line:
mlflow run .
means that we want to run the Project in this folder-e search_params
specifies which of the entry points in MLProject file we want to run--experiment-name loan
makes the experiment name equal to "loan". You can set it to whatever you want but write it down since you’ll need later-P model_type=rf
sets themodel_type
parameter insearch_params
script to "rf" (aka Random Forest)
When we run this line, four things should happen:
- Python virtual environment will get created
- New experiment called "loan" will get initialised
- Kaggle data will get downloaded into a newly created folder
data
- Hyperparameter search will begin
When the experiments are done, we can check the results in the MLflow UI . To access it, simply use the command mlflow ui
in your command line. In the UI, select "loan" experiment (or whatever you’ve called it) and add your metric to the experiments view.

The best RF model has achieved test PR AUC of 0.104 and took 1 minute to train. Overall, the hyper-parameter tuning took step too roughly 5 minutes to complete.
Register the Model
By now, we have trained, evaluated and saved 10 Random Forest models. In theory, you can simply go to the UI to find the best model and manually register it in your Model Registry and promote it to production. However, a better way is to do it in code since then you can automate this step. This is exactly what model_search.ipynb
notebook covers. Use it to follow along the sections below.
First of all, we need to find the best model. To do it programatically you need to gather all the hyperparameter tuning experiments (10 of them) and sort them by the test metric.
Your results will be different but the main goal here is to end up with correct best_run
parameter. Please note that if you’ve changed the experiment name, you’ll need to change it in this script as well. The parent run IDs can be looked up in the UI if you click on the parent experiment (in this case named "capable-ray-599").

To test if your model is working as expected, we can easily load it into the notebook.
If you managed to get the prediction – congrats, you’ve done everything correctly! Finally, registering the model and promoting it to Production is a piece of cake as well.
Running these 2 lines of code registers your model, and promotes it to "Production" bucket internally. All this does is changes the ways of how we can access the models and their metadata but it’s incredibly powerful in the context of model versioning. For example, at any point we can compare version 1 with version 2 when it comes out.

If you go to the "Models" tab of the UI, you’ll indeed see that there is a model named loan_model
and its Version 1 is currently in Production bucket. This means that we can now access the model by its name and stage which is very convenient.
Serve the Model
The easiest way of serving the model is to do it locally. This is usually done to test the endpoint and to make sure that we get the expected outputs. Serving with MLflow is quite easy, especially when we’ve already registered the model. All you need to do is run this this command line:
mlflow models serve - model-uri models:/loan_model/Production -p 5001
This line will start a local server that will host your model (that’s called loan_model
and is currently in Production
stage) at the port 5001. This means that you’ll be able to send the requests to localhost:5001/invocations
endpoint and get the predictions back (given that the requests are properly formatted).
To test the endpoint locally, you can use a requests
library to call it and get the predictions.
In the example above, we’re getting the same probability that we had before, but now these scores are produced by the local server and not your script. The inputs need needs to follow very specific guidelines, so that’s why we have 4 lines of pre-processing. You can read more about the expected formats for MLflow serving here
Summary
If you’ve managed to get this far and everything is working – give yourself a nice round of applause! I know it was a lot to take in, so let’s summarise everything you’ve achieved so far.
- You saw and understand how to structure your project with MLflow Projects
- You understand where in the script we log our parameters, metrics and models, and how
search_params.py
invokestrain_rf.py
- You can now run the MLflow Projects and see the results in MLflow UI
- You know how to find the best model, how to add it to the model registry, and how to promote it to Production bucket programmatically
- You can serve the models from model registry locally and can call the endpoint to make a prediction
What Next?
I strongly recommend that you put your skills to the test by attempting to run the pipeline for the Gradient Boosted Trees model and then deploying the HGBT model. All the necessary scripts are available to you, so all that remains is for you to configure the pipeline and complete the deployment on your own. Give it a go and if you encounter any challenges or have any questions, don’t hesitate to leave them in the comments section.