MLflow is a fantastic way to speed up your machine learning model development process through its powerful experimentation component. This enables Data Scientists to log the best algorithms and parameter combinations and rapidly iterate model development.
This blog aims to show users how to get the most out of MLflow experiments. We will focus on the start_run()
and its parameters, which can enhance your experimentation process. Additionally, we will cover the search_runs()
function, which provides an expansive view of your experimentation history and enables greater flexibility in analysis.
If you are new to MLflow, I suggest taking a look at the MLflow site, documentation, some blog posts or tutorial videos before jumping into this blog.

mlflow.start_run()
Most of these tricks are parameters of the start_run()
function. We call this function to initiate our experiment run, and it becomes the active run where we can log parameters, metrics, and other information.
This is the function I use most in MLflow and the one which offers the most instantaneous value to users.
1. run_id
The run_id
is a UUID which is specific to each experiment run. Once a run has been initiated, it is not possible to overwrite properties such as the model type or parameter values. However, you can use the run_id
to log additional values retrospectively, such as metrics, tags, or a description.
# Start MLflow run for this experiment
# End any existing runs
mlflow.end_run()
with mlflow.start_run() as run:
# Turn autolog on to save model artifacts, requirements, etc.
mlflow.autolog(log_models=True)
print(run.info.run_id)
diabetes_X = diabetes.data
diabetes_y = diabetes.target
# Split data into test training sets, 3:1 ratio
(
diabetes_X_train,
diabetes_X_test,
diabetes_y_train,
diabetes_y_test,
) = train_test_split(diabetes_X, diabetes_y, test_size=0.25, random_state=42)
alpha = 0.9
solver = "cholesky"
regr = linear_model.Ridge(alpha=alpha, solver=solver)
regr.fit(diabetes_X_train, diabetes_y_train)
diabetes_y_pred = regr.predict(diabetes_X_test)
# Log desired metrics
mlflow.log_metric("mse", mean_squared_error(diabetes_y_test, diabetes_y_pred))
mlflow.log_metric(
"rmse", sqrt(mean_squared_error(diabetes_y_test, diabetes_y_pred))
)
In this case, we may also want to log our coefficient of determination (r²) value for this run:
with mlflow.start_run(run_id="3fcf403e1566422493cd6e625693829d") as run:
mlflow.log_metric("r2", r2_score(diabetes_y_test, diabetes_y_pred))
The run_id
can either be extracted by print(run.info.run_id)
from the previous run, or by querying mlflow.search_runs()
, but more on that later.
2. experiment_id
You can set the experiment you want a run to log to in a few different ways in MLflow. The first command sets the experiment for all subsequent runs to "mlflow_sdk_test".
mlflow.set_experiment("/mlflow_sdk_test")
This can also be configured on a run-by-run basis through the experiment_id
parameter.
my_experiment = mlflow.set_experiment("/mlflow_sdk_test")
experiment_id = my_experiment.experiment_id
This value can then be reused when passed to start_run()
:
# End any existing runs
mlflow.end_run()
with mlflow.start_run(experiment_id=experiment_id):
# Turn autolog on to save model artifacts, requirements, etc.
mlflow.autolog(log_models=True)
print(run.info.run_id)
diabetes_X = diabetes.data
diabetes_y = diabetes.target
# Split data into test training sets, 3:1 ratio
(
diabetes_X_train,
diabetes_X_test,
diabetes_y_train,
diabetes_y_test,
) = train_test_split(diabetes_X, diabetes_y, test_size=0.25, random_state=42)
alpha = 0.8
solver = "cholesky"
regr = linear_model.Ridge(alpha=alpha, solver=solver)
regr.fit(diabetes_X_train, diabetes_y_train)
diabetes_y_pred = regr.predict(diabetes_X_test)
# Log desired metrics
mlflow.log_metric("mse", mean_squared_error(diabetes_y_test, diabetes_y_pred))
mlflow.log_metric(
"rmse", sqrt(mean_squared_error(diabetes_y_test, diabetes_y_pred))
)
mlflow.log_metric("r2", r2_score(diabetes_y_test, diabetes_y_pred))
3. run_name
When you specify the name of your run, you have greater control over the naming process than relying on the default names generated by MLflow. This enables you to establish a consistent naming convention for experiment runs, similar to how you might manage other resources in your environment.
# Start MLflow run for this experiment
# End any existing runs
mlflow.end_run()
# Explicitly name runs
today = dt.today()
run_name = "Ridge Regression " + str(today)
with mlflow.start_run(run_name=run_name) as run:
# Turn autolog on to save model artifacts, requirements, etc.
mlflow.autolog(log_models=True)
print(run.info.run_id)
diabetes_X = diabetes.data
diabetes_y = diabetes.target
# Split data into test training sets, 3:1 ratio
(
diabetes_X_train,
diabetes_X_test,
diabetes_y_train,
diabetes_y_test,
) = train_test_split(diabetes_X, diabetes_y, test_size=0.25, random_state=42)
alpha = 0.5
solver = "cholesky"
regr = linear_model.Ridge(alpha=alpha, solver=solver)
regr.fit(diabetes_X_train, diabetes_y_train)
diabetes_y_pred = regr.predict(diabetes_X_test)
# Log desired metrics
mlflow.log_metric("mse", mean_squared_error(diabetes_y_test, diabetes_y_pred))
mlflow.log_metric(
"rmse", sqrt(mean_squared_error(diabetes_y_test, diabetes_y_pred))
)
mlflow.log_metric("r2", r2_score(diabetes_y_test, diabetes_y_pred))
However, please be aware that run_name
is not a unique constraint in MLflow. This means that you could have multiple experiments (with unique run IDs) sharing the same name.

This means that every time you execute a new run in a with statement, it will create a new experiment of the same name, rather than append details to this run.
This brings us nicely to the next parameter.
4. nested
You may be familiar with nested experiment runs if you’ve run the scikit-learn functionGridSearchCV
to perform hyperparameter optimisation.
Nested experiments look something like the following in MLflow:

Note that the metrics here are saved against the parent run, which returns the best values recorded by the child runs. The child run values themselves are blank.
While nested experiments are excellent for evaluating and logging parameter combinations to determine the best model, they also serve as a great logical container for organizing your work. With the ability to group experiments, you can compartmentalize individual Data Science investigations and keep your experiments page organized and tidy.
# End any existing runs
Mlflow.end_run()
# Explicitly name runs
run_name = "Ridge Regression Nested"
with mlflow.start_run(run_name=run_name) as parent_run:
print(parent_run.info.run_id)
with mlflow.start_run(run_name="Child Run: alpha 0.1", nested=True):
# Turn autolog on to save model artifacts, requirements, etc.
mlflow.autolog(log_models=True)
diabetes_X = diabetes.data
diabetes_y = diabetes.target
# Split data into test training sets, 3:1 ratio
(
diabetes_X_train,
diabetes_X_test,
diabetes_y_train,
diabetes_y_test,
) = train_test_split(diabetes_X, diabetes_y, test_size=0.25, random_state=42)
alpha = 0.1
solver = "cholesky"
regr = linear_model.Ridge(alpha=alpha, solver=solver)
regr.fit(diabetes_X_train, diabetes_y_train)
diabetes_y_pred = regr.predict(diabetes_X_test)
# Log desired metrics
mlflow.log_metric("mse", mean_squared_error(diabetes_y_test, diabetes_y_pred))
mlflow.log_metric(
"rmse", sqrt(mean_squared_error(diabetes_y_test, diabetes_y_pred))
)
mlflow.log_metric("r2", r2_score(diabetes_y_test, diabetes_y_pred))
Should you need to add to this nested run, then specify the parent run’s run_id
in subsequent executions as a parameter, appending further child runs.
# End any existing runs
mlflow.end_run()
with mlflow.start_run(run_id="61d34b13649c45699e7f05290935747c") as parent_run:
print(parent_run.info.run_id)
with mlflow.start_run(run_name="Child Run: alpha 0.2", nested=True):
# Turn autolog on to save model artifacts, requirements, etc.
mlflow.autolog(log_models=True)
diabetes_X = diabetes.data
diabetes_y = diabetes.target
# Split data into test training sets, 3:1 ratio
(
diabetes_X_train,
diabetes_X_test,
diabetes_y_train,
diabetes_y_test,
) = train_test_split(diabetes_X, diabetes_y, test_size=0.25, random_state=42)
alpha = 0.2
solver = "cholesky"
regr = linear_model.Ridge(alpha=alpha, solver=solver)
regr.fit(diabetes_X_train, diabetes_y_train)
diabetes_y_pred = regr.predict(diabetes_X_test)
# Log desired metrics
mlflow.log_metric("mse", mean_squared_error(diabetes_y_test, diabetes_y_pred))
mlflow.log_metric(
"rmse", sqrt(mean_squared_error(diabetes_y_test, diabetes_y_pred))
)
mlflow.log_metric("r2", r2_score(diabetes_y_test, diabetes_y_pred))
One thing to note about this approach is that your metrics will now be logged against each child run.
5. mlflow.search_runs()
This trick is using the search_runs()
function.
This function allows us to programmatically query the experimentation GUI, and the results are returned in a tabular format that is easy to understand and manipulate.
In the below example, we can select specific fields from the runs in our experiment and load them into a Pandas DataFrame. Notice that the available columns greatly exceed those available in the experiments GUI!
# Create DataFrame of all runs in *current* experiment
df = mlflow.search_runs(order_by=["start_time DESC"])
# Print a list of the columns available
# print(list(df.columns))
# Create DataFrame with subset of columns
runs_df = df[
[
"run_id",
"experiment_id",
"status",
"start_time",
"metrics.mse",
"tags.mlflow.source.type",
"tags.mlflow.user",
"tags.estimator_name",
"tags.mlflow.rootRunId",
]
].copy()
runs_df.head()
As this is a Pandas DataFrame, we can add columns that may be useful for analysis:
# Feature engineering to create some additional columns
runs_df["start_date"] = runs_df["start_time"].dt.date
runs_df["is_nested_parent"] = runs_df[["run_id","tags.mlflow.rootRunId"]].apply(lambda x: 1 if x["run_id"] == x["tags.mlflow.rootRunId"] else 0, axis=1)
runs_df["is_nested_child"] = runs_df[["run_id","tags.mlflow.rootRunId"]].apply(lambda x: 1 if x["tags.mlflow.rootRunId"] is not None and x["run_id"] != x["tags.mlflow.rootRunId"]else 0, axis=1)
runs_df
If we want to aggregate the result set to provide information of runs over time, we can use:
pd.DataFrame(runs_df.groupby("start_date")["run_id"].count()).reset_index()

The automatic tags.estimator_name field allows us to review how many runs have been tested for each algorithm.
pd.DataFrame(runs_df.groupby("tags.estimator_name")["run_id"].count()).reset_index()

Given this is a DataFrame, we can export the data for any reporting requirements to give the required visibility to users who may not have access to the workspace, and compare across workspaces.
Closing thoughts
These are just a few examples of how to extend your use of MLflow’s functions and parameters in your experimentation process, but there are many more available in the python API.
Hopefully, this post has inspired you to explore some of the available functions and parameters and see if they can benefit your model development process. For additional information, refer to the API documentation and experiment with different configurations to find the best fit for your needs.
If you’re currently using any functions or parameters that I’ve not mentioned in this post, please let me know in the comments!
All code can be found in my GitHub Repo.