Putting Machine Learning model into production with Google Cloud Platform and DVC

Published in

Towards Data Science

6 min readMar 1, 2022

Introductions

Machine learning models do not have much of a value as a standalone artifact. No matter how incredible their performances are, they didn’t bring any substantial value to the business until it was delivered to the related users. However, the deployment aspect is often considered at the very last of ML projects because it’s not the subject data scientists familiar with.

This article is going to demonstrate one simple way to put ML models into action along with the tools used to accomplish it so those who have never deployed a model have a better idea of what it looks like.

Model Deployment Strategies

Before we start, let’s see what are options we can use for deploying our model. There are multiple ways to put ML models into productions which can be categorized roughly into 3 groups as follow:

Model-as-a-service
The model is a part of the software that serves users as requested in a real-time fashion. Mostly, it comes in a form of web services or APIs where the input and output of models are transferred through HTTP or other well-known protocols.

Model-as-a-service strategy(Image by author).

Batch prediction
As opposed to real-time serving, batch prediction strategy performs inference offline. The input is prepared for models by a data pipeline and, once the processing is finished, the result is saved into a database or data storage before being delivered to the users. The processing pipeline can be triggered by a scheduled job or user request but the output will not be served in real-time.

Batch prediction strategy(Image by author).

Model-on-edge
Unlike the previous two strategies where input and output are transferred back and forth through the internet, model-on-edge strategy uses ML models to make a prediction as soon as the sensor receives the data in order to make a real-time decision. These circumstances usually occur a lot in an automation system, robotics, and IoT applications.

Model-on-edge strategy(Image by author).

Overview

We’re going to create a workflow that takes images from the cloud storage, perform image segmentation, and put the result back into the cloud storage. As you may have been able to guess, the strategy we use here is batch prediction.

Model

The model we want to deploy is an image segmentation model that identifies clouds in an image. The input and output of the model are as follows:

The details regarding model architecture, training procedure, and performance are out of this article’s scope. We’re going to assume that the model is trained and ready to use.

Data Pipeline

Out batch prediction pipeline consists of 3 stages as follow:

User uploads an image(s) to the cloud storage bucket(In reality, this step might be handled by the system rather than a manual upload. However, that’ll not impact how our batch prediction pipeline works).
The scheduled job sends an HTTP request to the API to trigger the computation.
The compute service starts the computation by
3.1. Download a trained model
3.2. Download all images in the input cloud storage
3.3. Run inference on all images
All outputs are saved to the output cloud storage

Below is the diagram showing how the workflow looks like:

Model Registry

Another key aspect of the model deployment is how a software component(whether it’s an application, data pipeline, etc.) access the trained model. One possible option for this would be to store trained models on remote storage then put the file path into the config file. This approach, even though a valid option, makes the process of training and deploying models disconnected which leads to undesirable situations such as the application doesn’t realize there is a newer version of the model, the system fails to start because the file is relocated to the new path, etc. The better and more widely-accepted approach nowadays is to use a Model Registry.

The model registry is a centralized repository that stores, versions, serves, and manages trained ML models. It acts as a bridge between model development and model deployment where the former only needs to know where the model should be registered and the latter only knows what model is to be used. In this post, we’re going to use Data Version Control(DVC) to create a simple model registry from your GitHub project.

Implementation

Pipeline components

We’ll use Google Cloud Storage as input and output buckets and Cloud Scheduler as a scheduled job. Below is the command to create the buckets and a scheduled job using Google Cloud SDK. You can also do these actions either via GCP Console or Google Cloud CLI.

For the computing service, Cloud Functions is used for simplicity as it requires only the Python code and its dependencies. Note that, in reality, it might be better to use services such as Cloud Run or Compute Engine for computational-intensive tasks like image processing and deep learning. The code for running inference is as follow:

def run_inference(request):     
    model = load_model()     
    with torch.no_grad():         
        for blob in input_bucket.list_blobs():             
            obj_name = blob.name             
            img = load_image(obj_name)             
            out = inference(img)             
            save_image(out, obj_name)             
            blob.delete()

The function load_model is responsible for connecting to the model registry and downloading a trained model, while load_image, save_image, and inference read/write images from/to designated location and perform inference respectively.

Model Loading

As mentioned earlier, we’ll create a simple model registry using DVC. In short, DVC is a tool for data science projects that let you version control your data in a similar fashion as Git versions your source code. One handy feature of DVC is it enables you to turn your repository into a data registry with no effort. Suppose that you have a DVC project which has a set of tracked files, you can list and download all tracked files like this.

Loading model using DVC(image by author).

Besides using CLI, DVC also provides Python API which allows you to use these functionalities programmatically. We use this API to load the trained model from the repository as follow:

import dvc.api  
def load_model():
  with dvc.api.open(path=model_path, repo=url, mode='rb') as f:         
       params = yaml.safe_load(io.BytesIO(f.read()))
       model_params = params['model']  with dvc.api.open(path=model_path, repo=repo_url, mode='rb') as f:
       buffer = io.BytesIO(f.read())
       state_dict = torch.load(buffer)  model = Model(n_classes=model_params['n_classes'],
                in_channel=model_params['in_channels'])
  model.load_state_dict(state_dict)
  
  return model

Deployment

Even though we could deploy the code manually, it’s better to automate this process. We can use GitHub Actions to automate the Cloud Functions deployment with the plugin google-github-actions/deploy-cloud-functions provided by Google as follow:

name: Mergeon:
  push:
    branches:
      - mainjobs:
  deploy_function:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2      - name: "Authenticate GCP"
        id: auth
        uses: google-github-actions/auth@v0
        with:
          credentials_json: ${{ secrets.gcp_credentials }}      - name: "Deploy a Cloud Function"
        id: deploy-function
        uses: google-github-actions/deploy-cloud-functions@v0
        with:
          name: cloud-segmentation
          runtime: python37
          entry_point: run_inference
          memory_mb: 2048MB
          deploy_timeout: 600          
          env_vars: INPUT_BUCKET_NAME=cloud-segmentation-input,OUTPUT_BUCKET_NAME=cloud-segmentation-output

After adding this, whenever the new PR is merged into main branch, the Cloud Functions will be deployed automatically. The GCS bucket and scheduler are only created once so we can do that manually if we want.

The GIF below shows what the pipeline looks like when we put all components together.

All components assembled together(Image by author).

Conclusion

As you can see, although we already have a trained model in place, there is still a lot of work to be done so that it can provide real value. Choosing the deployment strategy, picking the model serving pattern, designing the infrastructure, and building CI/CD pipeline all of these are all crucial to the success of Machine Learning projects as much as using the right algorithm, ensuring data quality, and training model to achieve high accuracy metrics.