
Have you ever found yourself training models, tuning hyperparameters, and selecting features for hours only to realize you’ve already had a good set of parameters but forgot to track them or save the model? I know I have, maybe more often than I’d like to admit. Before you open a spreadsheet and start writing down which alpha values or n neighbors your model uses, I’d like to introduce you to MLflow.
MLflow is a versatile open-source platform designed to manage the end-to-end machine learning lifecycle, developed by Databricks. It offers an array of benefits to machine learning practitioners, data scientists, and developers, enabling streamlined experimentation, reproducibility, and deployment of ML models. So, let’s explore what it can do for you!
Main components of MLflow
Before we delve into the nuts and bolts of using MLflow, it’s crucial to understand what MLflow is and why it’s a critical tool in today’s ML landscape.
MLflow helps to manage the Machine Learning lifecycle, including experimentation, reproducibility, and deployment. It accommodates any (Python) machine learning library. It provides ready-to-use interfaces for the most common ones, giving it a high degree of versatility to suit all your development needs.
MLflow consists of four main components:
- MLflow Tracking: The main API, logs and organizes machine learning experiments. It records parameters, metrics, and artifacts (like models, notebooks, and scripts), enabling you to track your experiment runs and results. It comes with a UI you can access via localhost to view, visualize and manage your experiments.
- MLflow Projects: A code packaging format for reproducibility and sharing. It defines a standard structure for ML code, making it easier for you to understand it, reuse it, and collaborate with others.
- MLflow Models: A standard format for packaging ML models in multiple flavors (ML frameworks), and a repository for storing and sharing models. It simplifies model deployment across platforms.
- MLflow Model Registry: A centralized model repository with model lineage, versioning, stage transitions, and annotations. It’s particularly useful in a collaborative environment when you need to compare and combine your models with those of your team members.
MLflow’s strength lies in its ability to simplify and streamline the ML lifecycle, making it easy for you to track and compare experiments, reproduce code, manage models, and deploy solutions with relative ease.
Diving into MLflow Tracking: Manage your ML Experiments
Let’s kickstart the practical part of this tour with MLflow Tracking. The API and UI for logging and managing everything around your ML experiments.
You can use it in a simple script or scale it up for large-scale training environments. Let me illustrate the basic usage with a simple code snippet using ElasticNet regression from Scikit-learn.
import mlflow
import mlflow.sklearn
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error
import numpy as np
# Initiate a new MLflow run
with mlflow.start_run():
# Train and fit the model
model = ElasticNet(alpha=0.5, l1_ratio=0.5)
model.fit(X_train, y_train)
# Make predictions and calculate the RMSE
predictions = model.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, predictions))
# Log parameters
mlflow.log_param("alpha", 0.5)
mlflow.log_param("l1_ratio", 0.5)
# Log metric
mlflow.log_metric("rmse", rmse)
# Log model (artifact)
mlflow.sklearn.log_model(model, "model")
After running this script, you can view the logged run details in your MLflow tracking UI. The user-friendly interface allows you to filter and sort the runs based on different parameters and metrics, enabling a comparative analysis of various runs.

MLflow Model Registry: Streamline your Model Management
After getting started with MLflow Tracking, you may want to start organizing your models as well. This is where the MLflow Model Registry, a centralized model repository that integrates closely with MLflow Tracking, comes in. It is a great tool for individuals and teams to review, share, and collaborate on ML models.
The model registry streamlines the transition of models from experimentation to production. It achieves this by allowing model lineage tracking, model versioning, stage transitions, and model annotations.
You can use the model registry like this:
- Log a model from MLflow Tracking:
mlflow.sklearn.log_model(lr_model, "model")
2. Register a logged model in the registry:
result = mlflow.register_model(
"runs:/d16076a3ec534311817565e6527539c0/model",
"ElasticNetWineModel"
)
The model is registered using its run ID (d16076a3ec534311817565e6527539c0
in the example).
3. List all registered models:
You can check if your model has been registered successfully by listing all registered models:
mlflow.search_runs()
4. Load the model from the registry:
You can load the model from the registry for prediction or scoring:
model_uri = "models:/ElasticNetWineModel/1"
model = mlflow.pyfunc.load_model(model_uri)
5. Transition the model versions between stages:
Model Registry allows for model stage transitions. You can transition a model from ‘None’ to ‘Staging’, ‘Production’, or ‘Archived’ stages:
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
name="ElasticNetWineModel",
version=1,
stage="Production",
)
In this example, we transitioned version 1 of our model into the ‘Production’ stage for later usage.
Understanding MLflow Models
How exactly does MLflow save your models?
MLflow Models offer a standard format for packaging your machine learning models that can be used in a variety of downstream tools. For instance, real-time serving through a REST API, batch inference on a Spark cluster, and more.
MLflow Models use a simple convention for packaging models where each model is saved as a directory containing any files necessary and a descriptor file that lists several "flavors" the model can be used in.
A flavor defines a specific model format or library the model can run with. There are flavors for the most important libraries. For example, a TensorFlow model can be loaded as a TensorFlow SavedModel format or as a Python function to be applied to input data. If you have a model from a custom library or one that isn’t built-in you can still use it in the generic Python function flavor:
import mlflow.pyfunc
class ModelWrapper(mlflow.pyfunc.PythonModel):
def load_context(self, context):
# Initialization logic
pass
def predict(self, context, model_input):
# Prediction logic
pass
# Saving the model
mlflow.pyfunc.save_model(path="model_path", python_model=ModelWrapper())
When you’re ready to serve your model (e.g. for an API), you can use the mlflow models serve
command:
mlflow models serve -m models:/ElasticNetWineModel/1 -p 1234
This command will serve the specified version of the model (1
in this case) on the local host at port 1234
.
Exploring MLflow Projects: Simplify your Code Packaging
As stated above, MLflow Projects is a format for packaging your code in a reusable and reproducible way. It mainly helps you to share projects across teams and enables running them on different platforms. Each project is simply a directory or a repository that contains your code and a descriptor file named MLproject
.
The MLproject
file defines the project’s structure, including its dependencies, entry points, and parameters for your code. Your MLproject
file might look like this:
name: My_Project
conda_env: conda.yaml
entry_points:
main:
parameters:
alpha: float # no default value
l1_ratio: {type: float, default: 0.5} # default value
command: "python main.py {alpha} {l1_ratio}" # run script with params
In this example, main.py
is the entry point of the project. alpha
and l1_ratio
are parameters for this script. The conda.yaml
file lists the project’s Python dependencies.
You can run the projects’ entry point via the mlflow run
command:
mlflow run . -P alpha=0.42
The command fetches the necessary dependencies, runs your code with the given and/or default parameters, and logs the results to the tracking server.
Wrapping Up
This concludes our whirlwind tour of MLflow.
We’ve now covered all four components of MLflow: MLflow Tracking, MLflow Projects, MLflow Models, and MLflow Model Registry. These tools come together to provide you with a comprehensive platform for managing the machine learning lifecycle, facilitating better collaboration, code reusability, and experiment tracking.
Whether you’re working alone or in multiple teams, dealing with simple models or complex ML pipelines, I hope you find MLflow to be a worthy tool to integrate into your workflow.
Remember, I merely scratched the surface of MLflow’s capabilities here. For a deeper dive into advanced features, examples and usage, check out the official MLflow documentation or possible follow up posts. As always I also hope you learned something that helps you along the way. Enjoy exploring and harnessing the power of MLflow in your machine-learning projects!
-Merlin
Sources: