Introducing Profiler, by Auptimizer: Select the best AI model for your target device — no deployment required

Published in

Towards Data Science

5 min readJul 29, 2020

Profiler is a simulator for profiling the performance of Machine Learning (ML) model scripts. Profiler can be used during both the training and inference stages of the development pipeline. It is particularly useful for evaluating script performance and resource requirements for models and scripts being deployed to edge devices. Profiler is part of Auptimizer. You can get Profiler from the Auptimizer GitHub page or via pip install auptimizer.

Why we built Profiler

The cost of training machine learning models in the cloud has dropped dramatically over the past few years. While this drop has pushed model development to the cloud, there are still important reasons for training, adapting, and deploying models to devices. Performance and security are the big two but cost-savings is also an important consideration as the cost of transferring and storing data, and building models for millions of devices tends to add up. Unsurprisingly, machine learning for edge devices or Edge AI as it is more commonly known continues to become mainstream even as cloud compute becomes cheaper.

Developing models for the edge opens up interesting problems for practitioners.

Model selection now involves taking into consideration the resource requirements of these models.
The training-testing cycle becomes longer due to having a device in the loop because the model now needs to be deployed on the device to test its performance. This problem is only magnified when there are multiple target devices.

Currently, there are three ways to shorten the model selection/deployment cycle:

The use of device-specific simulators that run on the development machine and preclude the need for deployment to the device. Caveat: Simulators are usually not generalizable across devices.
The use of profilers that are native to the target device. Caveat: They need the model to be deployed to the target device for measurement.
The use of measures like FLOPS or Multiply-Add (MAC) operations to give approximate measures of resource usage. Caveat: The model itself is only one (sometimes insignificant) part of the entire pipeline (which also includes data loading, augmentation, feature engineering, etc.)

In practice, if you want to pick a model that will run efficiently on your target devices but do not have access to a dedicated simulator, you have to test each model by deploying on all of the target devices.

Profiler helps alleviate these issues. Profiler allows you to simulate, on your development machine, how your training or inference script will perform on a target device. With Profiler, you can understand CPU- and memory-usage as well as run-time for your model script on the target device.

How Profiler works

Profiler encapsulates the model script, its requirements, and corresponding data into a Docker container. It uses user-inputs on compute-, memory-, and framework-constraints to build a corresponding Docker image so the script can run independently and without external dependencies. This image can then easily be scaled and ported to ease future development and deployment. As the model script is executed within the container, Profiler tracks and records various resource utilization statistics including Average CPU Utilization, Memory Usage, Network I/O, and Block I/O. The logger also supports setting the Sample Time to control how frequently Profiler samples utilization statistics from the Docker container.

Get Profiler: Click here

How Profiler helps

Our results show that Profiler can help users build a good estimate of model runtime and memory usage for many popular image/video recognition models. We conducted over 300 experiments across a variety of models (InceptionV3, SqueezeNet, Resnet18, MobileNetV2–0.25x, -0.5x, -0.75x, -1.0x, 3D-SqueezeNet, 3D-ShuffleNetV2–0.25x, -0.5x, -1.0x, -1.5x, -2.0x, 3D-MobileNetV2–0.25x, -0.5x, -0.75x, -1.0x, -2.0x) on three different devices — LG G6 and Samsung S8 phones, and NVIDIA Jetson Nano. You can find the full set of experimental results and more information on how to conduct similar experiments on your devices here.

The addition of Profiler brings Auptimizer closer to the vision of a tool that helps machine learning scientists and engineers build models for edge devices. The hyperparameter optimization (HPO) capabilities of Auptimizer help speed up model discovery. Profiler helps with choosing the right model for deployment. It is particularly useful in the following two scenarios:

Deciding between models — The ranking of the run-times and memory usages of the model scripts measured using Profiler on the development machine is indicative of their ranking on the target device. For instance, if Model1 is faster than Model2 when measured using Profiler on the development machine, Model1 will be faster than Model2 on the device. This ranking is valid only when the CPU’s are running at full utilization.
Predicting model script performance on the device — A simple linear relationship relates the run-times and memory usage measured using Profiler on the development machine with the usage measured using a native profiling tool on the target device. In other words, if a model runs in time x when measured using Profiler, it will run approximately in time (a*x+b) on the target device (where a and b can be discovered by profiling a few models on the device with a native profiling tool). The strength of this relationship depends on the architectural similarity between the models but, in general, the models designed for the same task are architecturally similar as they are composed of the same set of layers. This makes Profiler a useful tool for selecting the best suited model.

Looking forward

Profiler continues to evolve. So far, we have tested its efficacy on select mobile- and edge-platforms for running popular image and video recognition models for inference, but there is much more to explore. Profiler might have limitations for certain models or devices and can potentially result in inconsistencies between Profiler outputs and on-device measurements. Our experiment page provides more information on how to best set up your experiment using Profiler and how to interpret potential inconsistencies in results. The exact use case varies from user to user but we believe that Profiler is relevant to anyone deploying models on devices. We hope that Profiler’s estimation capability can enable leaner and faster model development for resource-constrained devices. We’d love to hear (via github) if you use Profiler during deployment.

Authors: Samarth Tripathi, Junyao Guo, Vera Serdiukova, Unmesh Kurup, and Mohak Shah — Advanced AI, LG Electronics USA

Introducing Profiler, by Auptimizer: Select the best AI model for your target device — no deployment required

Why we built Profiler

How Profiler works

How Profiler helps

Looking forward

Written by Mohak Shah