
As the field of Machine Learning has matured in recent years, the need for integrating automatic continuous integration (CI), continuous delivery (CD) and continuous training (CT) to machine learning systems has increased. The application of DevOps philosophy to a machine learning system has been termed MLOps. The aim of MLOps is to fuse together the machine learning system development (ML) and machine learning system operation (Ops) together.
What is DevOps?
DevOps is a practice used by individuals and teams when developing software systems. The benefits individuals and teams can obtain through a DevOps culture and practice includes:
- Rapid development life cycles
- Deployment velocity
- Code quality through the use of testing.
To achieve these benefits two key concepts are utilised throughout the development of the software system.
- Continuous integration: merging of code base to a central code repository/location, automating the build process of the software system and testing components of the code base.
- Continuous delivery: automating software deployments.
A machine learning system is similar but not completely identical to a software system. The key differences are:
- Skills existing within the team: Often the people that develop the machine learning model/algorithm do not come from a software engineering background and focus primarily in the proof of concept/prototyping phase.
- Machine learning systems are highly experimental in nature. There is no guarantee that an algorithm will succeed beforehand without first trying and doing some experiments. Therefore, there is a need to track different experiments, feature engineering steps, model parameters, metrics etc.
- Testing a machine learning system goes beyond just unit testing. You also need to consider things like data checks, model drift, evaluating model performance deployed into production.
- Deployment of machine learning models is very bespoke depending on the nature of the problem it is trying to solve. It can involve multi step pipelines that includes data processing, feature engineering, model training, model registry and model deployment.
- There is a need to track the statistics and distribution of data over time to ensure that what the model sees today in a production environment is consistent with the data it was trained on.
Similarities of MLOps and DevOps
- Continuous integration of code base amongst developers, data scientists and data engineers.
- Testing of code and components of the machine learning system code.
- Continuous delivery of the system into production.
Differences between MLOps and DevOps
- In MLOps, in addition to testing the code you also need to ensure data quality is maintained across the machine learning project life cycle.
- In MLOps, you may not necessarily be deploying just a model artifact. Deployment of a machine learning system can require a machine learning pipeline that involves data extraction, data processing, feature engineering, model training, model registry and model deployment.
- In MLOps there is a third concept that does not exist in DevOps which is Continuous Training (CT). This step is all about automatically identifying scenarios/events that requires a model to be re-trained and re-deployed into production due to a performance degradation in the currently deployed machine learning model/system.
If you are interested to know about the different ways you can deploy a machine learning model please check out my article (https://towardsdatascience.com/machine-learning-model-deployment-options-47c1f3d77626).
Platforms and tools to assist with MLOps
- mlflow (https://mlflow.org/): This is a open source platform that aids and assists in the model tracking, model registry and model deployment steps. I also believe this is what Azure Machine Learning uses on their platform. Databricks has incorporated mlflow onto their platform.
-
version control system such as:
- github
- gitlab
- Azure DevOps
-
Cloud service to conduct experiments and deploy the machine learning pipeline:
- AWS SageMaker (https://aws.amazon.com/sagemaker/)
- Azure Machine Learning (https://docs.microsoft.com/en-us/azure/machine-learning/overview-what-is-azure-ml)
- Databricks (https://databricks.com/)
If you are interested to learn more about the similarities and differences between AWS SageMaker and Azure Machine Learning please check out my article (https://towardsdatascience.com/aws-sagemaker-vs-azure-machine-learning-3ac0172495da)