A Simple Way to Deploy Any Machine Learning Model

How to use Azure Functions to expose a REST API endpoint to serve ML models that can be computed in another server

Douglas Coimbra de Andrade
Towards Data Science

--

Is there an easy way to deploy a powerful image segmentation model to a mobile app? The answer is yes.

When a data scientist develops a machine learning model, be it using Scikit-Learn, deep learning frameworks (TensorFlow, Keras, PyTorch) or custom code (convex programming, OpenCL, CUDA), the ultimate goal is to make it available in production.

However, a lot of the times we just want a simple way to call a REST API and get predictions using the current classifier. For example, we may just want to show how the model integrates with a user interface without worrying about setting up an environment, scaling to production, balancing load or any of that.

This post will demonstrate the simplest way I found to:

  • Build a machine learning model;
  • Expose an endpoint (REST API) so that a website/app can POST a message to http://myendpoint.com/predict and get results;
  • All source code and installation instructions is available at https://github.com/douglas125/AzureMLDeploy

We won’t worry about exporting models, creating Docker images, running directly in the browser, installing packages or any of that — there are plenty of resources that deal with those issues (see References). In this article, the machine learning server (MLServer) could be the local computer or a virtual machine in the cloud. It will retrieve the data that needs to be processed and send the results back.

In addition, serverless architectures are among the cheapest and easiest to implement out there: essentially, there are charges per storage and execution time — if we don’t need to store inputs and outputs forever, costs should remain very low.

How does it work?

The Azure Function App acts as a middle man that receives and stores requests from clients and answers from the MLServer. When the client comes back to retrieve the results, the Function App delivers them to the client.

This looks very simple — and it is. Exactly what we are looking for in this article.

Example 1: Simple Ajax page

Following the installation instructions available at the repository, it is possible to build a simple Ajax page that summarizes the workflow of the client and MLServer sides. Note that, while the HTML page is extremely simple, it still demonstrates how to POST data to the endpoints from client side (to send jobs and retrieve results) and from MLServer (to get jobs and send results).

Simple Ajax page to test Azure Function App. Sequence: 1 — client posts task, receives task_id; 2 — MLServer fetches task and processes it; 3 — MLServer sends results to Function App; 4 — Client retrieves results.

Example 2: Running DeepLabV3+ as a service

C# is a nice tool to build user interfaces for Windows. This example shows how to do inference using python (which serves as the MLServer). Once again, all code is available in this repository.

The sequence of images below demonstrate what happens at client side and MLServer side. Note that the MLServer can be any computer or virtual machine running the Keras model in the Jupyter Notebook.

Client C# application loads image, resizes it to 512x512 (DeepLabv3+ input size) and sends it to Azure Function App Server.
A local computer or a virtual machine could be used to make predictions. In this case, we used Google Colab as the MLServer
Client C# application receives predictions (a .png file with the masks) and displays results to the user in a friendly interface.

Keeping it Simple

Serverless architectures like Azure Functions or AWS Lambda are excellent ways to enable cloud computing: they handle the problem of scaling the service, they are usually easy to use and, in many cases, they are also the cheapest option since they do not need a dedicated server and users pay per use (consumption plan). The choice of Azure Functions is simply because of my familiarity with C#.

We will build a simple serverless architecture that receives user requests, allows a remote machine to read and make predictions that are stored in the cloud and lets users receive the final results. Note that the Function App can be configured to require authentication, save users who made requests (to charge them later) and many other functionalities — this is just a starter code that should be used as simple base.

The workflow is as follows:

  • User makes a prediction request by POSTing a message to endpoint/predict with his payload — text, image, audio or whatever data needs to be analyzed;
  • Server receives user request, saves payload and generates a unique task_id, which is sent back to User;
  • MLServer (machine learning server) queries Server to retrieve the next job;
  • Server sends task_id and payload of next job to MLServer;
  • MLServer processes payload and sends results of task_id to Server;
  • Server receives and stores task_id results;
  • User queries Server using task_id and retrieves results.
Client — Server — MLServer architecture

Setting up the service

We already discussed what Client and MLServer do. Now it is time to dive into the Azure Function App Server.

The Server needs to handle requests from Client and MLServer. We need four endpoints:

  • predict — receives payload and creates task_id;
  • getresult— receives task_id and returns results for that task (if available);
  • getnexttask — sends task_id and payload to MLServer;
  • puttaskresult — receives task_id and stores result.

We will use Azure Blob storage to keep payloads and results. We need two folders: inbox (where unfinished tasks will be stored) and outbox (to store results). If we want to use multiple servers to process data in parallel, we probably would want to keep track of what tasks are being processed (so that the same task is not executed twice). However, that is beyond the scope of this article and would add extra, unnecessary complexity for our objectives.

At function puttaskresult we will delete the payload from the inbox folder and save the results inside outbox folder. If we wanted, we could copy the payload to a “processed” folder before deleting it, but we’ll keep it simple here.

For all implementation details, the interested reader should refer to this repository. It has all source code for the Function App as well as installation instructions.

Conclusion

This article demonstrated a very simple way to deploy machine learning models to client applications using Azure Functions to store and serve requests and prediction results. While this method is not as powerful as Tensorflow Serving or versatile as tensorflow.js, it has important advantages;

  • It is very simple to deploy;
  • It allows the programmer to serve ANY machine learning model — not just neural networks, along with any pre and postprocessing steps;
  • There is no need to convert or dockerize anything;
  • There is no need to allocate a dedicated VM — inference can be run locally;
  • Since inference can be run locally and Function Apps are charged per request/storage, this is a cheap option to demonstrate concepts.

Of course there are drawbacks:

  • It is not the best option to serve models that need to run near-realtime (i.e., that have execution time constraints);
  • It won’t run in the client like tf.js;
  • Load balance to MLServers has to be done manually.

Note that, more often than not, these drawbacks are not something developers worry about during early stages of prototyping/development — they can simply leave one PC on (or even Google Colab) and make the REST API available for the user interface team.

This is the simplest way I found to quickly deploy a machine learning algorithm in its current state to a client application.

References

Tensorflow.js. https://js.tensorflow.org/

Deploy models with the Azure Machine Learning service. https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where

A guide to deploying Machine/Deep Learning model(s) in Production. https://medium.com/@maheshkkumar/a-guide-to-deploying-machine-deep-learning-model-s-in-production-e497fd4b734a

How to EASILY put Machine Learning Models into Production using Tensorflow Serving. https://medium.com/coinmonks/how-to-easily-put-machine-learning-models-into-production-using-tensorflow-serving-91998fa4b4e1

What We Learned by Serving Machine Learning Models Using AWS Lambda. https://medium.freecodecamp.org/what-we-learned-by-serving-machine-learning-models-using-aws-lambda-c70b303404a1

Image Classification — Scoring sample. https://github.com/dotnet/machinelearning-samples/tree/master/samples/csharp/getting-started/DeepLearning_ImageClassification_TensorFlow

ML.NET — An open source and cross-platform machine learning framework. https://dotnet.microsoft.com/apps/machinelearning-ai/ml-dotnet

A beginner’s guide to training and deploying machine learning models using Python. https://medium.freecodecamp.org/a-beginners-guide-to-training-and-deploying-machine-learning-models-using-python-48a313502e5a

Deploying a Machine Learning Model as a REST API. https://towardsdatascience.com/deploying-a-machine-learning-model-as-a-rest-api-4a03b865c166

Deploying Machine Learning Models is Hard, But It Doesn’t Have to Be. https://www.anaconda.com/blog/developer-blog/deploying-machine-learning-models-is-hard-but-it-doesnt-have-to-be/

HOW WE DEPLOYED A SCIKIT-LEARN MODEL WITH FLASK AND DOCKER. https://blog.solutotlv.com/deployed-scikit-learn-model-flask-docker/?utm_medium=How-do-I-deploy-Machine-Learning-Models-as-an-API&utm_source=quora

File:Policja konna Poznań.jpg. https://commons.wikimedia.org/wiki/File:Policja_konna_Pozna%C5%84.jpg

By LukaszKatlewa — Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=49248622

Keras implementation of Deeplabv3+. https://github.com/bonlime/keras-deeplab-v3-plus

--

--

I obtained my D. Sc. in HPC applied to computer vision. I have special interest in deep learning architectures applied to speech, sensors and video analytics.