MLOps made simple: how to run a batch prediction pipeline using Azure Machine Learning components

You just want to get your model .pt file, upload it somewhere and get the predictions, right? Let’s see how to do it using AML infrastructure

Published in

Towards Data Science

7 min readFeb 6, 2023

In our day-to-day as data scientists we often need to use some models for internal purposes. If we worked in a team of 1 it would be ok to run them in our own machines, but we usually need to share our work with other teammates. Sharing Jupyter Notebooks that run our scripts is a way of doing it, but notebooks become hard to manage and operate after we pass the experiments phase.

Recently Azure introduced components, a “self-contained piece of code that does one step in a machine learning pipeline”. We can use the components to build independently executable workflows and share them with other teammates. In today’s article we’re going to use components to create a Feature Matching pipeline.

Note: if you’re totally new to Azure Machine Learning it might be good to read this other article where I give a brief review of some AML concepts.

MLOps 101

As Ville Tuulos points out in the book Effective Data Science Infrastructure: How to make data scientists productive (great read, I highly recommend reading it), there are 3 building blocks for a data science infrastructure:

Architecture: what the actual code looks like and how the system looks and feels to the data scientist
Job scheduler: how workflows are triggered, executed, and monitored and how failures are handled
Compute resources: where the code executes in practice

For us the Architecture part consists of our python code. This part doesn’t change much from platform to platform. The Job scheduler and the Compute resources do change depending on the platform we’re running the workflows.

Most of the data science workflow failures happen not because of the code itself but because something changed in the data or in the environment we’re running the code. Each AML component has its own environment and dependencies, so it contributes a lot to the stability of our workflows.

Using the AML stack we can build the Architecture using components, use pipelines jobs as our Job Scheduler and use compute clusters or compute instances to run our workflows.

Now let’s see how we can use all that inside a AML Workspace.

Our batch prediction pipeline

We’re going to create a batch prediction pipeline using Superglue and Superpoint [1]. Our input can have a target image (a movie poster for example), and for each image in the dataset we want to know if the movie poster is there or not.

Have you heard of Feature Matching using OpenCV?

Brute-Force Matching with SIFT Descriptors and Ratio Test

Superglue and Superpoint accomplish that using deep learning. The best part is that we don’t need to train the models again since Superpoint is trained in a self-supervised manner [2].

AML Components

Components require a well-defined interface (inputs and outputs). We can share and reuse them across pipelines and they’re versioned, so we can keep improving the code and make modifications as we go. The code of the component can be in any language (python, R, etc.); the only requirement is that it must be able to execute it by a shell command.

Our main component will read data from a data asset, run Superpoint+Superglue and output a csv file with the image urls that has the target image in it. The input data asset will have the list of images we want to verify and a .jpeg file with the movie poster.

We can build the components using the Azure ML CLI v2 or the Azure ML SDK v2. Here we’ll use the python SDK. The code is adapted from the superglue_rank_images.py script written by Ariya Sontrapornpol. This is how the component looks like:

from mldesigner import command_component, Input, Output
from pathlib import Path

@command_component(
    name="superpoint_and_superglue",
    version="1",
    display_name="Superpoint and Superglue",
    description="Run Superpoint and Superglue models and outputs csv with matching images",
    environment=dict(
        conda_file=Path(__file__).parent / "conda.yaml",
         image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
    ),
)
def superpoint_and_superglue_component(
    input_data: Input(type="uri_folder"),
    data_output: Output(type="uri_folder"),
):    
    import os    
    import pandas as pd
    import numpy as np
    import cv2
    import argparse
    import requests
    import shutil

    import os
    os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"

    import torch

    from pathlib import Path
    import random
    import matplotlib.cm as cm

    import models.matching
    import models.utils       
    
    def ranking_score(matches, match_confidence):
        return np.sum(np.multiply(matches,match_confidence)).astype(np.float32)  
        
    def load_models(device, nms_radius, keypoint_threshold, max_keypoints, superglue, sinkhorn_iterations, match_threshold):
        # Load the SuperPoint and SuperGlue models.
        
        print('Running inference on device \"{}\"'.format(device))
        config = {
            'superpoint': {
                'nms_radius': nms_radius,
                'keypoint_threshold': keypoint_threshold,
                'max_keypoints': max_keypoints
            },
            'superglue': {
                'weights': superglue,
                'sinkhorn_iterations': sinkhorn_iterations,
                'match_threshold': match_threshold,
            }
        }

        matching = models.matching.Matching(config).eval().to(device)
        return matching
    
    def get_scores(timer, device, pairs, matching, input_dir, output_dir, viz, viz_extension, resize, resize_float, query):
        # [...]

        
    def main():
        torch.set_grad_enabled(False)
        
        input_dir = input_data
        print('Looking for data in directory \"{}\"'.format(input_dir))
        output_dir = Path(os.path.join(input_data, "/superglue_outputs"))
        output_dir.mkdir(exist_ok=True, parents=True)
        print('Will write matches to directory \"{}\"'.format(output_dir))
        
        force_cpu = False
        
        device = 'cuda' if torch.cuda.is_available() and not force_cpu else 'cpu'
        timer = models.utils.AverageTimer(newline=True)      
        
        pairs = [["query.jpg", "query.jpg"],["query.jpg", "movie_poster.jpg"]]
        

        all_uri_files = os.listdir(input_data)
        candidate_image_filenames = []
        for image_name in all_uri_files:
            if (image_name.endswith('.jpg') or image_name.endswith('.png') or image_name.endswith('.jpeg')):
                candidate_image_filenames.append(image_name)
                
        
        nms_radius = 4
        keypoint_threshold = 0.005
        max_keypoints = 1024
        superglue = "indoor"
        sinkhorn_iterations = 20
        match_threshold = 0.2
        
        matching = load_models(device, nms_radius, keypoint_threshold, max_keypoints, superglue, sinkhorn_iterations, match_threshold)
        
        output = []       
        
        query = (output_dir / "query.jpg")
        
        for image in candidate_image_filenames:                 
            shutil.copyfile(os.path.join(input_dir, image), query)

            viz = False
            viz_extension = "png"
            resize = [1200, 900]
            resize_float = False

            _,predicted_score = get_scores(timer, device, pairs, matching, input_dir, output_dir, viz, viz_extension, resize, resize_float, query)            
            item_output = {"score":predicted_score,"image":image}
            output.append(item_output)
            
        return output          
        
    output = main()
    pd.DataFrame(output).to_csv((Path(data_output) / "results.csv"), index=False)

The @command_component decorator transforms the python function into static specification (YAML) that the pipeline service uses. We need to provide this metadata [3]:

name is the unique identifier of the component
version is the current version of the component. A component can have multiple versions.
display_name is a friendly display name of the component in UI, which isn't unique
description usually describes what task this component can complete
environment specifies the run-time environment for this component. The environment of this component specifies a docker image and refers to the conda.yaml file.

Being able to provide the environment for each component is great because we can isolate the dependencies that each component uses.

The uri_folder input type has a read only mount mode and the uri_folder output type has read-write mount mode. For more info about acessing data in pipeline jobs you can refer to Access data in a job and Data concepts in Azure Machine Learning.

To register this component in our AML workspace we’ll use the python SDK as well, here is the code:

from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient, load_component

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

# IMPORTANT: here we import the method we've created previously
from src.component import superpoint_and_superglue_component

# Register the component
ml_client.components.create_or_update(superpoint_and_superglue_component, version="1.3")

Now we can see the component in the AML workspace.

The next step is to create the pipeline to use the component.

AML Pipelines

We can define pipelines using the Azure ML CLI v2, Azure ML SDK v2 or using the Designer. Since our pipeline is simple we’ll define it using the designer. To create the input data asset we’ll also use the workspace UI.

Since our pipeline has a custom component we need to create a custom pipeline

To submit the pipeline run we need to specify the compute resource. You can create one using the UI as well.

Final thoughts

AML components are a great way to organize our code in our data science day-to-day activities. Since they can be written any language, we can run the code in any platform that can run shell commands, not only inside Azure Machine Learning.

By using AML though we get the UI to manage them, create data assets, manage our compute resources and so on. The AML ecosystem offers a great way of starting to use MLOps principles in our workflows.

References

[1] https://github.com/jomariya23156/SuperGlue-for-Visual-Place-Recognition

[2] DeTone, Daniel, Tomasz Malisiewicz, and Andrew Rabinovich. “Superpoint: Self-supervised interest point detection and description.” Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2018.

[3] https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-component-pipeline-python