The world’s leading publication for data science, AI, and ML professionals.

7 Lessons I’ve Learnt From Deploying Machine Learning Models Using ONNX

A great way to build once and deploy everywhere

Photo by olia danilevich on Pexels
Photo by olia danilevich on Pexels

In this post, we will outline key learnings from a real-world example of running inference on a sci-kit learn model using the ONNX Runtime API in an AWS Lambda function. This is not a tutorial but rather a guide focusing on useful tips, points to consider, and quirks that may save you some head-scratching!

What is ONNX?

The Open Neural Network Exchange (ONNX) format is a bit like dipping your french fries into a milkshake; it shouldn’t work but it just does. (Well, for me anyway).

ONNX allows us to build a model using all the training frameworks we know and love like PyTorch and TensorFlow and package it up in a format supported by many hardware architectures and operating systems. The ONNX Runtime is a simple API that is cross-platform and provides optimal performance to run inference on an ONNX model exactly where you need them: the cloud, mobile, an IoT device, you name it!

Gone are the days when our programming language or runtime of choice dictated how we build AI.

For a further deep dive into ONNX check out this article: ONNX – Made Easy.

How do we know it works?

At Bazaarvoice, we improve the E-Commerce experience for millions of people by enhancing a product’s user-generated content (UGC). A big part of this is the process of product matching, or in other words, identifying that two products are the same across retailers. Most of this can be done automatically due to unique product identifiers, however, there are millions of products lacking this significant data.

To solve this issue, we built a machine learning model that automates the product matching process, deployed it on a global scale, and now it accurately matches millions of products from some of the biggest brands in the world.

Prerequisites

While some of these lessons apply to multiple workflows, it is important to note that they are based on my team’s experience with the technologies we used:

🌎 Lesson 1: Build once, deploy everywhere

One of the best selling points of ONNX is its versatility and its ability to prevent framework lock-in, so don’t get bogged down choosing a language or deployment environment, or even an operating system. Choose what is suitable for you and your project.

There are plenty of training frameworks compatible with the ONNX format and a wide variety of popular deployment runtimes. You can find a handy compatibility guide here.

More information on deployment targets for web applications can be found in the ONNX runtime docs.

⚠️ Lesson 2: "Error: Non tensor type is temporarily not supported."

The error in the heading is an issue we ran into when attempting to run an ONNX model, exported from Python (scikit-learn), in a Node.js runtime using the onnxruntime-node npm package.

Due to the error message, all clues hinted that the problem lay within the onnxruntime-node package, however, the source of the problem was in the conversion code from scikit-learn to ONNX.

The fix is quite simple, using the sklearn-onnx example code sample from the onnx.ai website, we need to add an options variable with zipmap set to false:

# Convert into ONNX format
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
initial_type = [('float_input', FloatTensorType([None, 4]))]
onx = convert_sklearn(clr, initial_types=initial_type, options={'zipmap': False})  # enables getting probabilities in Node
with open("rf_iris.onnx", "wb") as f:
    f.write(onx.SerializeToString())

By setting zipmap to false, we can now receive probability outputs when running the model in a Node.js runtime.

For more on exporting, see the ONNX tutorials GitHub repo. It provides a good description and examples for models developed using machine learning frameworks and cloud services.

📚 Lesson 3: JavaScript libraries for DataFrame manipulation

Several npm libraries provide DataFrame-style data transformation (splitting, joining, group by, etc). Choosing the best one depends on your use case. Two libraries we used were danfo.js and dataframe-js.

danfo-js is built on tensorflow-js and brings data processing, machine learning, and AI tools to the hands of JavaScript developers. It is heavily inspired by Pandas. This means if you are familiar with the Pandas API, danfo.js should be a smooth transition.

dataframe-js is built with a more functional programming-inspired API, so if you are more familiar with JavaScript, you may feel more comfortable using dataframe-js than danfo-js.

Both danfo-js and dataframe-js worked great for our use case and provided all the functionality we need for complex feature engineering. The main drawback to danfo-js is the package size. It was too big for our Serverless application, whereas, dataframe-js packaged under 1Mb which is ideal for Serverless.

Some other criteria to consider when choosing a library are:

  • Language support (NodeJS vs browser JS vs Typescript)
  • Dependencies (i.e. if it uses an underlying library)
  • Actively supported (active user-base, active source repository, etc)
  • Size/speed of JS library
  • Performance
  • Functionality/flexibility
  • Ease-of-use
  • Built-in visualisation functions

Other DataFrame manipulation libraries include:

📦 Lesson 4: Use the serverless-plugin-optimize plugin

Keeping project packages small in size is very important when making Serverless deployments. The image below shows the Python package sizes of some of the libraries most commonly used by data scientists. We can see how they eat up a great deal of the deployment size limits for AWS Lambdas and Google Cloud Functions.

Python package sizes of popular frameworks and cloud deployment limits. Photo by author
Python package sizes of popular frameworks and cloud deployment limits. Photo by author

However, we can work within these limitations. ONNX converts models created with the likes of PyTorch, Tensorflow, and pandas into a model compatible with a relatively small ONNX runtime package (~13MB). This is suitable for some cases, but add a hefty _nodemodules folder into the mix and it is still common to exceed the deployment limits.

The serverless-plugin-optimize plugin significantly decreases the Serverless package size. In our case, the plugin allowed us to package our model, dependencies, and code, comfortably under the 50Mb .zip file deployment limit for AWS Lambdas.

To allow an AWS Lambda to access the onnxruntime-node package and the onnx model – add the following lines to your serverless.yml file:

custom:
 optimize:
 external: ['onnxruntime-node', 'onnxruntime-common']
 includePaths: ['PATH_TO_ONNX_MODEL']

🚀 Lesson 5: Deploying onnxruntime-node to AWS

An important point to note when using onnxruntime-node is that where you are running your app determines how you should install the package. If it is not installed to the correct architecture and/or platform when deployed, you will see missing module errors being thrown – like this:

Runtime.ImportModuleError: Error: Cannot find module '../bin/napi-v3/linux/x64/onnxruntime_binding.node'

Most native node modules use node-pre-gyp which uses an install script to search for pre-built binaries for your OS, arch, and v8 ABI combination, and fallback to a native build if one is not available.

This means that a simple npm install onnxruntime-node will work when running locally but when running Serverlessly inside a cloud function we need to explicitly install to our environment.

In our case, the AWS Lambda we used had an x64 architecture and ran on a Linux machine, so the npm install command we had to run before deploying was:

npm install --arch=x64 --platform=linux onnxruntime-node

📅 Lesson 6: Scheduling

If you need your model to run automatically on a schedule and don’t want to run your model manually – try adding EventBridge to your Serverless config if you are using AWS. The schedule is set using either a cron expression or a rate expression.

The following is an example configuration that is added to the serverless.yml file:

events:
 - http:
   path: really-cool-onnx-project
   method: post
 - eventBridge:
   enabled: true
   schedule: cron(0/20 6-13 * * ? *) # Runs every 20 minutes between 6am and 2pm (UTC) every day
   input:
    stageParams:
     stage: prod

An important point to note, if your AWS Lambda function times out during an invocation by EventBridge, it will keep invoking until execution completes.

Google Cloud Functions and Azure Functions both have cron-job based scheduling capability too, with Cloud Scheduler and Timer Trigger, respectively.

📈 Lesson 7: Scaling Serverless apps efficiently

Finding the best balance between performance and cost is a crucial aspect of running Serverless apps at scale.

In our use case, we are generating predictions for 10,000s products per AWS Lambda invocation. This is a significant amount of data to process. For that reason, it was important for us to understand two areas of our Lambda to optimise performance: execution time (avoiding timeouts) and cost-efficiency.

We tested our Serverless app with six different Lambda memory size configurations and deduced any outcomes we could find in terms of memory size, execution time, and cost.

As you can see, as memory size doubled, the execution time more or less halved – until it plateaued at 4096MB, reaching a point of diminishing return. Using a higher allocated memory size also increases the amount of virtual CPU available to the Lambda function, which consequently saves costs as it reduces the total execution time.

2048MB memory size ended up being the cheapest – surprisingly even cheaper than 256MB whilst also being approximately 10x faster. Therefore, 2048MB was the best balance between performance and cost for our use case.

Closing thoughts

I hope this post helps you when developing with ONNX and Serverless. At Bazaarvoice, we are championing these technologies by delivering Artificial Intelligence solutions using ONNX on a global scale. If you would like to find out more about our solutions with ONNX, check out this conference presentation by one of our machine learning engineers.

💻  If you have any questions, please reach out / matthewleyburn.com

Have you used Onnx? Let me know in the comments and tell me about your experiences!


Related Articles