ML Ops with Azure Machine Learning

High-level Conceptual Overview of ML DevOps Pipeline implementation framework

Pankaj Jainani
Towards Data Science

--

Introduction

Azure Machine Learning Service (AML) offers end-to-end capabilities to manage the ML lifecycle. MLOps (Machine Learning Operations), framework-agnostic interoperability, integrations with ML tools & platforms, security & trust, and extensibility & performance are the key characteristics.

Azure Machine Learning SDK in Python or PowerShell provides an opportunity to develop the automated ML pipeline powered by the orchestration capabilities of Azure DevOps or GitOps. Thereby, ML engineers and Data Scientists are allowed to focus on their tasks, and DevOps (build, deploy and manage) are offloaded through an automated process.

In this post, we will be discussing the generic flow of such MLOps using Azure platform services. Of course, this framework can easily be extended to the other public cloud propositions (AWS or GCP) with minor tweaks, whereby mapping to their native service offerings.

Photo by Gerrie van der Walt on Unsplash

Solution Framework

Develop & Build

Pipeline for environment Build (Image by Author)

Following are the high-level steps to create ML workspace and environment build:

  • Leverage Azure DevOps (ADO) as an orchestration platform to automate infrastructure using the IaC approach
  • Data science teams (Data Scientists, ML Engineers, Data engineers, etc) are assigned and enabled to the ML Workspace by Azure IAM.
  • The team is allocated the compute within the workspace for training and testing purposes.
  • The team churns the ML pipeline experiments, each individual can work in a silo and develop a reusable component for the pipeline. The experiment thus consists of data extraction, data exploration, data-preprocessing, model training, evaluation, and testing.
  • The ML engineers then set up source control and test workbench to support model development and testing framework.
  • The environment provisioning also enables the packages, ML frameworks, and modules to support development.

Training Pipeline

Automate Model Training and Experimentation using Azure ML & Azure DevOps (Image by Author)

Azure ML enables the ML Team to accomplish end-to-end model experimentation with an automated training pipeline by leveraging Azure DevOps capabilities:

  • Automate the data preprocessing and model training by executing pipelines steps.
  • Tracking the outcome of the metrics from various model training experiments helps to evaluate models from many runs. The best model is thus chosen.
  • The model registration step registers the chosen model with its meta-data.
  • Thereby, it offers capabilities of model management from within the Azure ML service’s capabilities.

Deployment Pipelines

Automate Deployment by using Azure DevOps (Image by Author)

Azure DevOps offers the power to automate the CI & CD steps required for the ML Operations, the step-by-step process to containerize the final evaluated model, enable its deployment to the staging environment clusters, getting endpoints to execute QA & testing pipelines. The developer learns from the telemetry & metrics regarding the quality and performance of the deployed model on test data.

The Release Manager is responsible for triggering the deployment of the validated model to the production clusters. There are various strategies to accomplish the same; one such example is canary deployment, this is particularly beneficial to closely monitor the model performance on a smaller subset of incoming data instances.

Model Monitoring & Retraining

Model Monitoring & Retraining (Image by Author)

Setup operations, observability, and a control tower to continuously monitor the model drift and data drift. If the model performance metrics fall below the threshold, DevOps pipelines get triggered and retrain the model using ML training pipeline.

The logs and application telemetry are essential components to implement end-to-end metrics-driven automated MLOps.

Conclusion

The high-level solution architecture for implementing MLOps using Azure DevOps, tooling, and Azure Machine Learning service will look similar to the below figure. This solution provides an overview to set up development, training, testing, and deployment components of the entire MLOps ecosystem. Observability implementation is the core to capture telemetry and metrics data to enable event-driven automation for the entire MLOps process by leveraging Azure DevOps pipelines.

Azure Machine Learning — MLOps Solution Architecture (Image by Author)

References

--

--

Here to learn about Artificial Intelligence & Cloud Computing | LinkedIn: http://www.linkedin.com/in/p-jainani | Twitter: @pankajjainani