What is MLOps?

What problems MLOps solves and best practices

Published in

Towards Data Science

9 min readFeb 10, 2022

Building a machine learning model involves creating the model, training it, tuning, and deploying it. This process should be:

Scalable
Collaborative
Reproducible

For instance, it would be regrettable to build an excellent model but not reproduce the results in a production environment. The set of principles, tools, and techniques that ensure building machine learning models is scalable, collaborative, and reproducible are referred to as MLOps. In the world of software engineering, these practices are referred to as DevOps.

Inspired by this paper. Image by layer.ai

DevOps vs. MLOps vs. Data Ops

DevOps are a set of principles that ensure that there is continuous delivery of high-quality software. In the machine learning realm, these practices are referred to as MLOps. DataOps involve a set of rules that ensure that high-quality data is available to analyze and train machine learning models. DataOps can be thought of as tightly integrating with MLOps.

What problems does MLOps solve?

The set of tools and techniques defined in MLOps are geared towards making the lives of data scientists and machine learning practitioners easier. Let’s take a look at some of the problems that MLOps solves.

Versioning

Versioning is a common practice in software engineering where tools such as Git and GitHub are used to version code. Apart from versioning code, other things need to be versioned in machine learning. These items include:

Data used in model training
Model artifacts

Versioning models and data ensures that machine learning experiments are reproducible.

Monitoring model performance

Models placed in production can degrade over time. This is caused by differences in training data and testing data. This is usually referred to as data drift. By monitoring the performance of a model, these issues can be identified and addressed quickly.

Feature generation

Creating features can be a time-consuming and compute-intensive task. Applying proper MLOps techniques ensures that features that are generated once can be reused as many times as needed. This frees up the data scientist to focus on designing and testing the model.

What skills do you need for MLOps?

MLOps is quite a broad field and requires quite several skills. Fortunately, you are not expected to have all of these skills. Specializing in a couple of areas makes more sense. However, here are the skills needed by an MLOps team to deliver a machine learning project successfully:

Ability to articulate the business problem and the objectives
Collect the data needed to solve the identified problem
Prepare and process the data so that it is acceptable by machine learning models
Create features that are important to the problem in question
Build and train machine learning models
Develop a pipeline for ingesting data, generating features, training, and evaluating the model
Deploy the model so that the actual users can use it. This can also be part of the above pipeline
Monitor how the model performs in the real world

With those basics out of the way, let’s now take a look at the main components of MLOps.

Parts of MLOps

Despite the field being quite broad, a couple of parts come together to make it one piece. In this section, we’ll explore those parts.

Feature store

Also referred to as a feature factory, it stores the features used in training a machine learning model. It is a critical part of MLOps because it ensures that there is no duplicity in creating features. If necessary, features can also be fetched and used for building other models or for general analysis. Features are also versioned while in the feature store, ensuring that one can revert to a particular feature version that resulted in a better model.

Data versioning

Apart from versioning features, the entire dataset used to create a certain model can also be versioned. Versioning data ensures that there is reproducibility in the process of creating models. It is also essential during auditing since it makes it easier to identify the datasets used to develop various models.

ML metadata store

To get rid of the magic involved in creating machine learning models, one has to log everything. Logging is critical for reproducibility. Some of the essential items to log include:

The seed used in splitting the data. This ensures that you are using the same split when creating a training and testing set
The random state used to initialize the model. The random state affects the reproducibility of model training
Model metrics
Hyperparameters
Learning curves
Training code and configuration files
Code used to generate features
Hardware logs

Storing model metadata is vital for various reasons:

Building dashboard with different models
Enabling the searchability of models based on hyperparameters

Model versioning

Versioning models is important because it enables switching between models in real-time. Apart from that, multiple models can be served to users at the same time to monitor performance. For instance, once a new model is available, it can be served to a few users to ensure that it performs as expected before rolling it out to everyone. Versioning is also critical from a compliance, governance, and historical point of view.

Model registry

Once a model has been trained, it is stored in a model registry. Every model in the registry will have a version for reasons already mentioned above. Each model should also be coupled with its:

Hyperparameters
Metrics
Feature version used to create the model
Dataset version used in training the model

..to mention a few

The model metadata mentioned above is important for:

Compliance with regulations
Management of the models
Identifying the endpoints of models in production

Model artifacts will usually be saved automatically depending on the MLOps tool you are using. You can also instruct the tool to save the best model checkpoints and upload them to the registry.

Model serving

Once a model is in the registry it can be deployed and served to users. Serving a model means creating endpoints that can be used to run inference on the model. The model artifact can also be downloaded and packaged with an application. However, deploying API endpoints makes it easier to use the model in various applications. That said, there is a case for packaging the model in applications such as mobile applications to reduce the inference latency.

Model monitoring

Once machine learning models have been deployed, they have to be monitored for model drift and production skew. Model drift occurs when the statistical differences between the training data and inference data change in unexpected ways. The performance of the model thus degrades. You can catch these problems early by monitoring the statistical properties of the training and prediction data.

Production skew occurs when the served model performs dismally compared to the offline model. This can be caused by bugs during the training process, serving bugs, and discrepancies in training and inference data.

Model drift and production skew should be monitored to ensure that the model is behaving as expected.

Model retraining

Machine learning models can be retrained for two main reasons:

To improve the performance of the model
When new training data becomes available

Your machine learning pipeline should detect the availability of new data or the dismal performance of the model and trigger retraining of the model. The system should also detect and deprecate models that would not benefit from retraining.

CI/CD

Continuous integration and continuous deployment in machine learning ensure that high-quality models are created and deployed often. Continuous delivery ensures that code is frequently merged in a central repository where automated builds and tests are implemented. In machine learning, this would involve not only testing the code but also the resulting models. It also entails packaging the models in readiness for use by actual users.

Continuous delivery involves automatically deploying code changes to a staging or production environment. In a machine learning pipeline, this would involve deploying a model to test and or production servers. Frequent deployments are significant because they ensure that code and models are tested vigorously and often before moving them to production.

How to implement MLOps

You can create a system to implement the items we have mentioned above. Alternatively, you can use a machine learning pipeline orchestration platform that will make your workflow easier. Let’s discover some of the best tools that you can use for an ML pipeline orchestration.

MLOps solutions

The choice of a machine learning orchestration tool will depend on a couple of factors, including:

The skills of your team
Your budget
Whether you want to automate part of your pipeline or the entire pipeline
The ease of integrating the new tool

Just to mention a few.

Let’s now mention some of the best tools you can use to orchestrate your machine learning pipeline.

MLflow is an open-source platform for managing a machine learning life cycle. The platform can be used for ML experiment tracking, deployment as well as a central model registry.
Sacred is an open-source library that can be used to organize, log and reproduce machine learning experiments. It doesn’t ship with a web UI. Omniboard is a popular front-end library for Sacred.
ModelDB is an open-source tool for versioning models, storing model metadata, and managing machine learning experiments. It can be used for making ML pipelines reproducible as well as displaying model performance dashboards.

MLOps best practices

MLOps is a relatively new field; however, some best practices will lead to the success of your machine learning orchestration process when adhered to. Let’s mention some of them:

Use tools that are collaborative. This makes it easier for everyone on the team to access code, data, and information about the project, for example, the generated features. It also makes it easier to raise and track issues.
Start with a simple model. Starting with a simple model gives you adequate time to ensure that the infrastructure is right. A complex model means that you have to debug a complicated model and optimize the infrastructure it will run on.
Just launch. Don’t spend months on end building and deploying the machine learning model. It is better to launch the model as soon as possible to start testing it on actual users. You can serve the model to a small number of users to start getting initial feedback. That feedback can be used to iterate the model and infrastructure as necessary.
Perform automated regression tests. This is crucial to ensure that new code doesn’t introduce bugs in existing code. Code that fails the tests is not merged into the main source code. Regression testing ensures that new code doesn’t break existing functions.
Automate model deployments. This ensures that new models that pass certain tests become automatically available to users. It also frees up engineers from the manual process of packaging models for production. The process involves automated packing of models together with their dependencies and delivering them to production or staging environments. The models should be monitored constantly and automatically rolled back when they perform dismally.
Attach predictions to model versions and data. This makes it easier to track every prediction to a specific model and data. This is important for traceability, reproducibility, and compliance reasons. Logging predictions with data and model versions also makes it easy to debug models in the event of unexpected behavior.
Measure training and serving skew. Machine learning models may not always perform as expected when served for use with unseen data. As a result, it is crucial to measure the difference between performance on training and unseen data. If the difference is not acceptable by your standards, you will have to implement a way to alleviate that, for instance, reworking the features.
Implement shadow production. This involves using production data to make predictions on a model. The predictions are, however, not to used to make real-world decisions. They are compared with the decisions made by the existing system of making decisions even if that system is manual. When the decisions made by the model are acceptable, it can be promoted to making real decisions.
Hyperparameter tuning optimization. Selecting and searching for the best model parameters by hand can be a pain in the neck. Automating the process will hasten your machine learning experimentation process. The results of hyperparameter optimization can then be compared, and the best algorithms and parameters were chosen for production.

Final thoughts

This article has been a primer into the world of machine learning operations, popularly known as MLOps. We have covered various aspects of MLOps as well as a couple of best practices. We have also looked at some tools that you can use to automate the MLOps process. More specifically, you have learned:

What is MLOps
The difference between MLOps and DevOps
Problems that MLOps solve
Skills needed to operate in the MLOps space
Various components of MLOps
End-to-end MLOps solutions