The world’s leading publication for data science, AI, and ML professionals.

Improve your MLflow experiment, keeping track of historical metrics

Today, we are going to extend our SDK with function for keeping track of runs' metrics interactively and we'll finally run our SDK

Image by Sahand Babali on Unsplash
Image by Sahand Babali on Unsplash

Join Medium with my referral link – Stefano Bosisio

Welcome back to the second part of our journey in Mlflow. Today we’ll extend the current SDK implementation with two functions for reporting historical metrics and custom metrics. Then, we’ll finally see the SDK working with a simple example. Next time, we’ll dig into MLflow plugins and we’ll create a "deployment" plugin for GCP AI platform

Here my first article about MLflow SDK creation:

Scale-up your models development with MLflow

Table of Content

What do we need today — – Report experiment’s runs metrics to the most recent run

What do we need today

Firstly, let’s think of the design of the main SDK protocol. The aim today is to allow data scientists to:

  1. add to a given experiment’s run the historical metrics computed in previous runs
  2. add custom computed metrics to a specific run

Thus, we can think of implementing the two following functions:

  • report_metrics_to_experiment : this function will collect all the metrics from previous experiment’s runs and will group them in an interactive plot, so users can immediately spot issues and understand the overall trend
  • report_custom_metrics : this function return data scientists’ metrics annotations, posting a dictionary to a given experiment. This may be useful if a data scientist would like to stick to a specific experiment with some metrics on unseen data.

Report experiment’s runs metrics to the most recent run

This function makes use of the MLflowClient Client in MLflow Tracking manages experiments and their runs. From MLflowClient we can retrieve all the runs for a given experiment. From there, we can extract each run’s metrics. Once we gather together all the metrics we can proceed with a second step, where we are going to use plotly to have an interactive html plot. In this way, users can analyse each single data point for all the runs in the MLflow server artefacts box.

Fig.1 shows the first part of report_metrics_to_experiment function. Firstly, the MlflowClient is initiated, with the given input tracking_uri . Then, the experiment’s information is retrieved with client.get_experiment_by_name and converted to a dictionary. From here each experiment’s run is listed, runs_list . Each run has its run_id which is practical to store metrics information in a dictionary models_metrics . Additionally, metrics can be access via run.to_dictionary()['data']['metrics'] . This value returns the name of the metric.

From the metric’s name, the metric’s data points can be recorded through client.get_metric_history() This attribute returns the steps and the values of the metric, so we can append to lists and saved them in models_metrics[single_run_id][metric] = [x_axis, y_axis]

Fig.2 shows the second part of report_metrics_to_experiment Firstly, a new plotly figure is initialised fig = go.Figure(). Metrics are then read from models_metrics and added as a scatter plot. The final plot is saved in html format, to have an interactive visualization.

Report custom metrics to a run

The final function we are going to implement today, reports a custom input to a specific run. In this case a data scientist may have some metrics obtained from a run’s model with unseen data. This function is shown in fig.3. Given an input dictionary custom_metrics (e.g. {accuracy_on_datasetXYZ: 0.98} ) the function uses MlflowClient to log_metric for a specific run_id

Update the experiment tracking interface

Now that two news functions have been added to the main MLflow protocol, let’s encapsulate them in our experiment_tracking_training.py In particular, end_training_job could call report_metrics_to_experiment , so, at the end of any training, we can keep track of all the historical metrics for a given experiment, as shown in fig.4

Additionally, to allow users to add their own metrics to specific runs, we can think of an add_metrics_to_run function, which receives as input the experiment tracking parameters, the run_id we want to work on and the custom dictionary custom_metrics (fig.5):

Create your final MLflow SDK and install it

Patching all the pieces together, the SDK package should be structured in a similar way:

mlflow_sdk/
           mlflow_sdk/
                      __init__.py
                      ExperimentTrackingInterface.py
                      experiment_tracking_training.py
           requirements.txt
           setup.py
           README.md

The requirements.txt contains all the packages we need to install our SDK, in particular you’ll need numpy, mlflow, pandas, matplotlib, scikit_learn, seaborn, plotly as default.

setup.py allows to install your own MLflow SDK in a given Python environment and the script should be structured in this way:

To install the SDK just use Python or a virtualenv Python as: python setup.py install

SDK in action!

It’s time to put in action what our MLflow SDK. We’ll test it with a sklearn.ensemble.RandomForestClassifier and the iris dataset¹ ² ³ ⁴ ⁵ (source and license, Open Data Commons Public Domain Dedication and License). Fig.7 shows the full example script we are going to use (my script name is 1_iris_random_forest.py)

tracking_params contains all the relevant info for setting up the MLflow server, as well as the run and experiment name. After loading the dataset, we are going to create a train test split with sklearn.model_selection.train_test_split . To show different metrics and plots in the MLflow artefacts I run 1_iris_random_forest.py 5 times, varying the test_size with the following values:0.3, 0.2, 0.1, 0.05, 0.01

Once the data have been set up, clf=RandomForestClassifier(n_estimators=2) we can call experiment_tracking_training.start_training_job. This module will interact with the MLflow context manager and it will report to the MLflow server the script that is running the model as well as model’s info and artefacts.

At the end of the training we want to report all the experiment run’s metrics in a single plot and, just for testing, we are going to save also some "fake" metrics like false_metrics = {"test_metric1":0.98, ... }

Before running the 1_iris_random_forest.py in a new terminal tab open up the connection with the MLflow server as mlflow ui and navigate to http://localhost:5000 or http://127.0.0.1:5000 . Then, run the example above as python 1_iris_random_forest.py and repeat the run 5 times for different values of test_size

Fig.8: MLflow UI after running the example script
Fig.8: MLflow UI after running the example script

Fig.8 should be similar to what you have after running the example script. Under Experimentsthe experiments’ names are listed. For each experiment there is a series of runs, in particular, under random_forest you’ll find your random forest runs, from 1_iris_random_forest.py

For each run we can immediately see some parameters, which are automatically logged by mlflow.sklearn.autolog() as well as our fake metrics (e.g. test_metric1 ) The autolog function saves also Tags , reporting the estimator class (e.g. sklearn.ensemble._forest.RandomForestClassifier) and method ( RandomForestClassifier ).

Clicking on a single run more details are shown. At first you’ll see all the model parameters, which, again, are automatically reported by the autolog function. Scrolling down the page we can access the Metrics plots. In this case we have just a single data point, but you can have a full plot as a function of the number of steps for more complicated models.

Fig.9: Artifacts saved by MLflow SDK. In the artifacts box you can find the code used to run the model (in my case 1_iris_random_forest.py) as well as the model pickle files under model folder and all the interactive metrics plots as well as confusion matrix.
Fig.9: Artifacts saved by MLflow SDK. In the artifacts box you can find the code used to run the model (in my case 1_iris_random_forest.py) as well as the model pickle files under model folder and all the interactive metrics plots as well as confusion matrix.

The most important information will then be stored under the Artifacts box (fig.9). Here you can find different folders which have been created by our mlflow_sdk:

  • Firstly, code is a folder which stores the script used to run our model – this was done in experiment_tracking_training on line 24 with traceback , here the link, and pushed to MLflow artefacts on line 31 of run_training function, here the link).
  • Following, model stores binary pickle files. MLflow automatically saves model files as well as its requirements to allow reproducibility of the results. This will be super helpful at deployment time.
  • Finally, you’ll see all the interactive plots. ( *.html ), generated at the end of the training, as well as additional metrics we have computed during the training, such as training_confusion_matrix.png

As you can see, with minimal intervention we have added a full tracking routing to our ML models. Experimenting is crucial at development time and in this way, data scientists could easily use MLflow Tracking functionality without over modifying their existent codes. From here you can explore different "shades" of reports, adding further information for each run as well as running MLflow on a dedicated server to allow cross-teams collaborations.

¹ Fisher, Ronald A. "The use of multiple measurements in taxonomic problems." Annals of eugenics 7.2 (1936): 179–188.

² Deming, W. Edwards. "Contributions to Mathematical Statistics. RA. New York: Wiley; London: Chapman & Hall, 1950. 655 pp. " Science 113.2930 (1951): 216–217.

³ Duda, R. O., and P. E. Hart. "Pattern Classification and Scene Analysis.(Q327. D83) John Wiley & Sons." (1973): 218.

⁴ Dasarathy, Belur V. "Nosing around the neighborhood: A new system structure and classification rule for recognition in partially exposed environments." IEEE Transactions on Pattern Analysis and Machine Intelligence 1 (1980): 67–71.

⁵ Gates, Geoffrey. "The reduced nearest neighbor rule (corresp.)." IEEE transactions on information theory 18.3 (1972): 431–433.


That’s all for today! I hope you enjoyed these two articles about MLflow and its SDK development. Next time we’ll dig into MLflow plugins world, which, in theory, could lead your team to the deployment phase as well.

If you have any question or curiosity, just write me an email at [email protected]


Related Articles