
I have lost a lot of time running my Jupiter notebook to build a Machine Learning model with the best performances. The best solutions depended mostly on a particular combination of hyperparameters and it was hard to memorize all the experiments I have tried before going to the end of the tunnel.
Recently, I discovered MLflow, a very efficient open-source platform for the complete managing of the machine learning life cycle, in which the main stages are gathering data, data preparation, creating the model, training of the model and deployment of the model. It has very nice functionalities I will explain in the following section.
In this post, I am going to focus on the ability of MLflow to track the training run metrics and the model hyperparameters of each run experiment. I will show you how to exploit this powerful tool through a simple example of training a regressor with Sklearn. The steps are the following:
- Introduction
- Prerequisites
- Train an MLflow model on train.py
- Visualize the results on the localhost
Introduction

As seen before, MLflow is a platform to manage the machine learning life cycle. It provides four components, that can work together or separately, depending on the application.
- MLflow Tracking is used to track/record the experiments. First, you store the logging parameters, the metrics, the output files when running the code and later you can visualize the results of all the experiments on the localhost. In this post, we are focusing on logging and querying experiments using Python.
- MLflow Project is a component employed for packaging Data Science code in a reusable and reproducible way.
- MLflow Models is used to package machine learning models. There are multiple methods to save and load MLflow models. For example, we train a Sklearn model and later we log it as an MLflow artefact for the current run using the function
log_model
- Model Registry is a tool specialized in adding the model to the model registry. It allows providing the chronology of the models produced from staging to production. It also permits adding descriptions to the models at each given run.
Prerequisites

In this tutorial, I am using Visual Studio Code as IDE. It’s free open-source, which supports multiple languages and for each language, it requires installing extensions. Moreover, it’s able to deal with Jupyter notebooks, which are indispensable for manipulating and analyzing data within the code. It also integrates a terminal, which is needed to write some command lines to use Mlflow. Apart from these advantages, it’s able to manage files and folders on your project folder. I suggest you check this easy tutorial for getting started using VSCode with Python. The principal steps are:
- Install Python from python.org if you don’t have it on your local machine
- Add the extension "Python" in VSCode
- Create a new Python environment
Once you configured your Python environment, it’s time to install MLflow:
pip install mlflow
To be sure that the installation worked well, you should check if you find a folder called "mlruns" in your project’s folder.

Whenever you run your program written in Python, the MLflow Python API logs runs to files in an mlruns directory.
Train an MLflow model on train.py
In this example, we are using the Bike Sharing dataset to predict the hourly number of rental bikes based on features like temperature, wind speed, and humidity. The dataset is stored in UCI’s machine learning repository [2]. Moreover, we are going to train a random forest regressor, which takes three hyperparameters, n_estimators, max_features, and max_depth. All the code is saved into a file with format .py, called train.py, and is shown below:
When you right-click on the code the option "Run Python File in Terminal", you should have a similar output:

The most fundamental part we should focus on is from with mlflow.start_run()
line, which allows to automatically conclude the run at the end of the block. The methods mlflow.log_param
and mlflow.log_metric
provide a way to log parameters and metrics under the current run.
To try out other values for the hyperparameters, we can pass them as arguments on the following command:
python train.py 200 8 6
I suggest you tempt other values to have a better idea of the efficiency of MLflow.
Visualize the results on the localhost
To compare the different combinations of hyperparameters and models, we can run the command on the terminal:
mlflow ui
It returns an URL where you’ll find a table that records all the hyperparameters and the corresponding resulting metrics (http://localhost:5000).

You can notice that all your experiments are now stored in this table. If you have duplicates of the same trial, you also have the possibility to delete one of them. Moreover, you can download all the results into a CSV file to analyze fastly the performances of the model(s).
Final thoughts:
In this post, we saw what MLFlow is and how to exploit it to track and store our trials. I hope you find useful this tutorial for getting started on using it. If you are interested in discovering other platforms to track the experiments, check out this guide, which introduces Neptune.ai. Thanks for reading. Have a nice day!
References:
[2] https://www.capitalbikeshare.com/data-license-agreement
[3] Train, Serve, and Score a Linear Regression Model
Did you like my article? Become a member and get unlimited access to new data science posts every day! It’s an indirect way of supporting me without any extra cost to you. If you are already a member, subscribe to get emails whenever I publish new data science and python guides!