Deploying A Deep Learning Model on Mobile Using TensorFlow and React

We cover how to build a cross-platform mobile app (for both iOS and Android) using React Native and TensorFlowJS React Native adaptor

Reshama Shaikh
Towards Data Science

--

Photo by Anna Pelzer on Unsplash

This project was completed jointly by Nidhin Pattaniyil and Reshama Shaikh.

As mobile phones have become more accessible, consequently, mobile use has been increasing. Users are utilizing mobile devices over desktop more frequently, and apps on mobile are in high demand. These internet-connected devices provide an opportunity to bring the inference models closer to the user.

Outline

  • About the Data: Food-101
  • PART 1: Training an Image Classifier Using TensorFlow
  • PART 2: Converting the Model
  • PART 3: Considerations for Inference: Running on Server vs. Client
  • PART 4: Deploying the Web App
  • PART 5: Deploying the Mobile App

About the Data: Food-101

The Food-101 data is used for this project, which includes 101 food categories for a total of 101,000 images. Thus, each class has 1,000 images, of which 250 are manually reviewed test images, and 750 are training images. The categories of the ETHZ Food-101 are the 101 most popular categories from the food picture sharing website foodspotting.com.

Food-101 Dataset

Data citation

Bossard, Lukas and Guillaumin, Matthieu and Van Gool, Luc, Food 101 Mining Discriminative Components with Random Forests, European Conference on Computer Vision, 2014.

PART 1: Training an Image Classifier using TensorFlow

The image classifier was trained using TensorFlow 2.3.4. The code is available for reference in this notebook on GitHub. There are a number of pre-trained models available in TensorFlow 2 / Keras. We used the ResNet50 and MobileNetV2 architectures. The MobileNet model family is mobile-friendly, and it will be used for the mobile app.

Table 1: Compare Model Architectures

We present below a table comparing models with respect to accuracy, number of parameters, and model size.

MobileNetV2 is a mobile-friendly architecture with fewer parameters. The final model that was used in the mobile app deployment is a MobileNetV2 fine-tuned model which had about 66% accuracy.

table with 4 rows showing parameters for 4 models
Image by Author

PART 2: Converting the Model (Quantization & Optimization)

We downloaded the model.h5 and classes.json (this is the list of food class names) files to our local computer into a folder named model_tf and did our model conversions within a virtual environment.

The model we created, model.h5, is in the format “TensorFlow Keras”. We needed to convert it to the “TensorFlow.js” format because that is needed for client-side inference. We are converting the default tensorflow.js model for a few reasons:

  • Model shards: Our model file is large, and the default tensorflow.js breaks the model down into 5 MB shards. For deploying the mobile app, we need one file, so we are specifying weight_shard_size_bytes of 50,000,000 bytes to get that file.
  • Inference-speed optimization using GraphModel conversion

How does GraphModel conversion boost TensorFlow.js models’ inference speed? It’s achieved by leveraging TensorFlow (Python)’s ahead-of-time analysis of the model’s computation graph at a fine granularity. The computation-graph analysis is followed by modifications to the graph that reduce the amount of computation while preserving the numeric correctness of the graph’s output result.

  • Inference-speed optimization using Quantization: Quantization is a post-training technique to reduce the size of models. Here, we use quantization to decrease the default 32-bit precision to 16-bit precision which will reduce the model file size by half.

We combined all 3 conversion steps into this:

We saved our converted model file and uploaded it to GitHub releases area of our mobile repo: deploying-mobile-app/release

These model and classes files will be used as input when creating the web app.

Table 2: Model Size for MobileNetV2

Note that the quantized model size has been reduced by 50%, from 14MB to 6.9MB. Inference times are similar for both model sizes. For most cases, the literature shows that 16-bit quantization does not impact accuracy significantly.

Web App:  Compare Model Size Before & After Optimization and Quantization╔═════════════╦════════════════════╗
║ Model ║ Model Size ║
║ ║ ║
╠═════════════╬════════════════════╣
║ MobileNetV2 ║ 14MB ║
║ ║ ║
╠═════════════╬════════════════════╣
║ MobileNetV2 ║ 6.9MB ║
║ (16-bit ║ ║
║ quantized)║ ║
╚═════════════╩════════════════════╝

PART 3: Considerations for Inference: Running on Server vs. Client

Latency/Network connection of user

In server-side inference, once the image/text is received, the inference time is consistent. However, the total time is determined by the user’s network upload speed. In client-side inference, the inference time is dependent on the hardware that the user is running.

Privacy

For sensitive data, users might be uncomfortable with sending the data to a server. Client-side inference allows users to run their workload securely.

Future updates to models

A benefit of server-side inference is the ability to deploy new, state-of-the-art models and consistently update them. In client-side inference, deployment of new models is restricted by the user’s expressed update frequency.

State of the art models vs. mobile optimized

Due to client hardware and storage restrictions, models that are small and optimized for inference are ideal. Most websites on the web are less than 2 MB of JavaScript code and CSS. The simplest model we created was about 20 MB, which is still not ideal for serving on the web. Therefore, most models currently are served on a server.

PART 4: Deploying the Web App

We used this template repository to deploy the web. The template repository is one web app that does both server-side and browser-based inference:
deploying-web-app

We copied the model to our forked repo.

This is the original file from Colab: model.h5
This is the converted TensorFlow.js file: model_tfjs

The structure of the repository with the model files is as below. We placed the original and converted model in the directory backend/assets.

├── backend
│ ├── app.py
│ ├── assets
│ │ ├── classes.json
│ │ ├── model_tf
| | |──├──model.h5
│ │ └── model_tfjs
│ │ ├── group1-shard1of1.bin
│ │ └── model.json

Serving the Web App Locally

There are various options for serving web apps in Python, including Flask, Django, and FastAPI.

We served out browser app with FastAPI and used React for the frontend framework.

First, we ran the app locally using Docker.

The repo provides a Docker file to run the app. Run these commands to launch the app. The first time you run using Docker, it could take up to 10 minutes. This command needs to be run at the base level of your repo where the Dockerfile file is located.

Running the above two commands starts a web server running locally on the machine.

The server can be accessed locally at http://localhost:8000.

Voilà! We have a web app running locally! It looks like the following demo.

Demo of Web App

This app is in production at: manning-deploy-imagenet.herokuapp.com

Image by Author

Serving the Web App to a Cloud Platform (Heroku)

Heroku is a nice, free option for deploying the app.

Once the Heroku command-line tools are installed, it can be run using the below commands.

Replace APP_NAME with something unique. The below steps could take some time (5 to 10 minutes) depending on your internet upload speed. For our project, we set APP_NAME="manning-deploy-imagenet".

The app can be tried out here: manning-deploy-imagenet.herokuapp.com

Inference times

We measured the latency for several sample images. We experimented with images of different sizes and visited the site on desktop and mobile.

Table 3: Inference Times for Web App

Web App:  Compare Inference Times╔═════════════╦════════════════════╦═══════════════════╦════════╗
║ Inference ║ Desktop Browser ║ Mobile Browser ║ Model ║
║ Source ║ duration ║ duration ║ Size ║
║ ║ (Inference in ms) ║ (Inference in ms) ║ ║
╠═════════════╬════════════════════╬═══════════════════╣════════╣
║ Server ║ 559 ║ 594 ║ 29MB ║
║ (Heroku) ║ (202) ║ (180) ║ ║
╠═════════════╬════════════════════╬═══════════════════╣════════╣
║ Browser ║ 50 ║ 177 ║ 6.9MB ║
║ ║ ║ ║ ║
╚═════════════╩════════════════════╩═══════════════════╩════════╝
NOTES
The above results show the latency cost associated with sending an image to the server for inference.
* For the Server (Heroku) model, it is the h5 file, the keras model.
* For the Browser model, it is the post-training quantized and optimized model. (converted using TensorFlow.js)

PART 5: Deploying the Mobile App

The benefit of having inference running natively on a mobile app is that it is functional without an internet connection. Because the model is running on the user’s device, the latency is lower. Inference running locally on the device also protect’s the user’s privacy.

We used React Native and TensorFlowJS React Native adaptor since it allowed us to build a cross-platform mobile app.

We used this template repository to deploy the mobile web; it works for both iOS and Android: deploying-mobile-app

We uploaded the tfjs converted model file to the releases section.

The inference is happening in the file ModelService.tsx. To initialize the TensorFlow.js model, ModelService.tsx calls the below code in the create method.

Below is the core code used for getting a prediction from an image used in classifyImage.

Expo is a free and open-source toolchain built around React Native to help build cross-platform native iOS and Android projects using JavaScript and React. The JavaScript environment that is commonly used for building apps is Node. To run this app, we use yarn, which is used to install JavaScript packages like Expo. After yarn has been installed, we did:

yarn run start

The next step is to open the Expo app and once you are logged in, you should see the app listed.

Image by Author

Demo of Mobile App

This mobile app is in production at:

Image by Author

Table 4: Comparing Inference Times for Mobile App

Note: We would expect the inference time for the mobile app to be faster. It’s a bit slower here. It could be due to the React Native adaptor not being mature.

Mobile App:  Compare Inference Times
╔═════════════╦════════════════════╦═══════════════════╗
║ Inference ║ Mobile Browser ║ Mobile App ║
║ Source ║ duration ║ duration ║
║ ║ (Inference in ms) ║ (Inference in ms) ║
╠═════════════╬════════════════════╬═══════════════════╣
║ Pixel 1 ║ 310 ║ 411 ║
║ (2016) ║ ║ ║
╠═════════════╬════════════════════╬═══════════════════╣
║ iPhone XS ║ 220 ║ 327 ║
║ (2019 ║ ║ ║
╚═════════════╩════════════════════╩═══════════════════╝
NOTES
Both the mobile browser and app allow you to run inference in less than 1 second. This performance is acceptable for a mobile app.

Summary

This article provides an outline for how to run a deep learning classifier using TensorFlow, and how to serve the model on both web and mobile. For more step-by-step instructions, check out our Manning liveProject: Deploying a Deep Learning Model on Web and Mobile Apps (using TensorFlow and React). The first milestone of this project, which focuses on training the model, is available publicly.

rectangle with purple background and title of project
liveProject: Deploying a Deep Learning Model

We focus our presentation using TensorFlow-serving because the TensorFlow ecosystem for deploying mobile apps is more mature than the PyTorch one. PyTorch Mobile was released approximately October 2019, and TensorFlow Lite was released in May 2017.

Video

We presented on this topic at PyData Global 2021. There is a 30-minute video with Q&A discussion at the end.

References

--

--