Deep Learning for Detecting Objects in an Image on Mobile Devices

Mobile Objects Detection App Development using Expo, React-Native, TensorFlow.js, and COCO-SSD

Yuefeng Zhang, PhD
Towards Data Science

--

Photo by Author

Recently I published an article [1] to demonstrate how to use Expo [2], React [3], and React Native [4] to develop a multi-page mobile application that uses TensorFlow.js [5] and a pre-trained convolutional neural network model MobileNet for image classification on mobile devices.

As described in [1], React [3] is a popular JavaScript framework for building Web user interfaces. Reactive Native inherits and extends the component framework (e.g., component, props, state, JSX, etc.) of React to support the development of native Android and iOS applications using pre-built native components such as View, Text, TouchableOpacity, etc. The native code in mobile platform specific languages (e.g., Object-C, Swift, Java, etc.) is typically developed using Xcode or Android Studio. To simplify mobile app development, Expo provides us with a framework and platform built around React Native and mobile native platforms that allow us to develop, build, and deploy mobile applications on iOS, Android, and web apps using JavaScript/TypeScript. Thus any text editor tool can be used for coding.

Regarding machine learning mobile apps, TensorFlow.js for React Native is a powerful enabler for us to train new machine learning and deep learning models and/or use pre-trained models for prediction and other machine learning purposes directly on mobile devices.

In this article, similarly to [1], I develop a multi-page mobile application to demonstrate how to use TensorFlow.js [5] and a pre-trained convolutional neural network model COCO-SSD [6][7], a new SSD (Single Shot MultiBox Detector) [8], for detecting COCO (Common Objects in Context) [9] in an image on mobile devices.

Similarly to [1], this mobile application is developed on Mac as follows:

  • using Expo to generate a multi-page application template
  • installing libraries
  • developing mobile application code in React JSX
  • compiling and running

It is assumed that the latest node.js has been installed on your local computer/laptop such as Mac.

1. Generating Project Template

In order to use Expo CLI to generate a new project template automatically, first the Expo CLI needs to be installed:

npm install expo-cli

Then a new Expo project template can be generated as follows:

expo init coco-ssd
cd coco-ssd

The project name is coco-ssd in this article.

As described in [1], I choose the tabs template of Expo managed workflow to automatically generate several example screens and navigation tabs. The TensorFlow logo image file tfjs.jpg is used in this project and it needs to be stored in the generated ./asserts/images directory.

2. Installing Libraries

The following libraries need to be installed for developing the objects detection app for mobile devices:

  • @tensorflow/tfjs, that is, TensorFlow.js, an open-source hardware-accelerated JavaScript library for training and deploying machine learning models.
  • @tensorflow/tfjs-react-native, a new platform integration and backend for TensorFlow.js on mobile devices.
  • @react-native-community/async-storage, an asynchronous, unencrypted, persistent, key-value storage system for React Native.
  • @tensorflow-models/coco-ssd, pre-trained model that can take an image as input and returns an array of most likely object class predictions, their confidences, and locations (bounding boxes).
  • expo-gl, provides a View that acts as an OpenGL ES render target, useful for rendering 2D and 3D graphics.
  • jpeg-js, a pure javascript JPEG encoder and decoder for node.js
npm install @react-native-community/async-storage @tensorflow/tfjs @tensorflow/tfjs-react-native expo-gl @tensorflow-models/coco-ssd jpeg-js

In addition, the react-native-fs (a native filesystem access for react-native) is required by @tensorflow/tfjs-react-native/dist/bundle_resource_io.js:

npm install react-native-fs

The expo-camera (a React component that renders a preview for the device’s either front or back camera) is needed as well since it is used in @tensorflow/tfjs-react-native/dist/camera/camera_stream.js.

expo install expo-camera

3. Developing Mobile Application Code

As described before, first I used Expo CLI to generate example screens and navigation tabs automatically. Then I modified the generated screens and added a new screen for detecting objects in an image. The following are the resulting screens:

  • Introduction screen (see Figure 2)
  • Objects detection COCO-SSD screen (see Figures 3 and 4)
  • References screen (see Figure 5)

There are three corresponding tabs at the bottom of screen for navigation purpose.

This article focuses on the COCO-SSD screen class (see [10] for source code) for objects detection in an image. The rest of this section discusses the implementation details of objects detection.

3.1 Preparing TensorFlow, COCO-SSD Model, and Camera Access

The lifecycle method componentDidMount() is used to initialize TensorFlow.js, load the pre-trained COCO-SSD model, and get permission for accessing camera on mobile device after the user interface of the COCO-SSD screen is ready.

async componentDidMount() {
await tf.ready(); // preparing TensorFlow
this.setState({ isTfReady: true});
this.model = await cocossd.load(); // preparing COCO-SSD model
this.setState({ isModelReady: true });
this.getPermissionAsync();
}
getPermissionAsync = async () => {
if (Constants.platform.ios) {
const { status } = await Permissions.askAsync(Permissions.CAMERA_ROLL)
if (status !== 'granted') {
alert('Please grant camera roll permission for this project!')
}
}
}

3.2 Selecting Image

Once the TensorFlow library and the COCO-SSD model are ready, the method selectImage() is called for choosing an image on mobile device for objects detection.

selectImage = async () => {
try {
let response = await ImagePicker.launchImageLibraryAsync({
mediaTypes: ImagePicker.MediaTypeOptions.All,
allowsEditing: true,
aspect: [4, 3]
})
if (!response.cancelled) {
const source = { uri: response.uri }
this.setState({ image: source })
this.detectObjects()
}
} catch (error) {
console.log(error)
}
}

3.3 Detecting Objects in an Image

Once an image has been chosen on mobile device, the detectObjects() method is called for detecting objects in the image.

In this method, first the fetch API for TensorFlow React Native is used to load the selected image on mobile device. Then the method imageToTensor() is called to convert the loaded raw image data into a 3D image tensor. Finally, the prepared COCO-SSD model is called to take the 3D image tensor as input and generates a list of detected objects with their classes, probabilities, and locations (bounding boxes).

detectObjects = async () => {
try {
const imageAssetPath = Image.resolveAssetSource(this.state.image)
const response = await fetch(imageAssetPath.uri, {}, { isBinary: true })
const rawImageData = await response.arrayBuffer()
const imageTensor = this.imageToTensor(rawImageData)
const predictions = await this.model.detect(imageTensor)
this.setState({ predictions: predictions })
} catch (error) {
console.log('Exception Error: ', error)
}
}
imageToTensor(rawImageData) {
const TO_UINT8ARRAY = true
const { width, height, data } = jpeg.decode(rawImageData, TO_UINT8ARRAY)
// Drop the alpha channel info for COCO-SSD
const buffer = new Uint8Array(width * height * 3)
let offset = 0 // offset into original data
for (let i = 0; i < buffer.length; i += 3) {
buffer[i] = data[offset]
buffer[i + 1] = data[offset + 1]
buffer[i + 2] = data[offset + 2]
offset += 4
}
return tf.tensor3d(buffer, [height, width, 3])
}

Note that there are two versions of fetch API, one is React fetch API and the other is a fetch API for TensorFlow React Native. The correct one is the fetch for TensorFlow React Native, which can be installed as follows:

import { fetch } from ‘@tensorflow/tfjs-react-native

3.4 Reporting Objects Detection Results

Once the objects detection is done, the method renderPrediction() is called to display the objects detection results on the screen of mobile device.

renderPrediction = (prediction, index) => {
const pclass = prediction.class;
const score = prediction.score;
const x = prediction.bbox[0];
const y = prediction.bbox[1];
const w = prediction.bbox[2];
const h = prediction.bbox[3];
return (
<View style={styles.welcomeContainer}>
<Text key={index} style={styles.text}>
Prediction: {pclass} {', '} Probability: {score} {', '} Bbox: {x} {', '} {y} {', '} {w} {', '} {h}
</Text>
</View>
)
}

4. Compiling and Running Mobile Application

The mobile application in this article consists of a react native application server and one or more mobile clients. A mobile client can be an iOS simulator, Android emulator, iOS devices (e.g., iPhone and iPad), Android devices, etc. I verified the mobile application server on Mac and mobile clients on both iPhone 6+ and iPad.

4.1 Starting React Native Application Server

As described in [1], the mobile app server needs to start before any mobile client can begin to run. The following commands can be used to compile and run the react native application server:

npm install
npm start

If everything goes through smoothly, a Web interface as shown in Figure 1 should show up.

Figure 1: React native application server.

4.2 Starting Mobile Clients

Once the mobile app server is running, we can start mobile clients on mobile devices.

Since I use Expo [2] for development in this article, the corresponding Expo client/app is needed on mobile devices. The Expo client app for iOS mobile devices is available for free in Apple Store.

Once the Expo client app has been installed on an iOS device, then we can use the camera on the mobile device to scan the bar code of the react native application server (see Figure 1) to use the Expo client app to run the mobile application.

Figure 2 shows the introduction scree of the mobile application on iOS devices (iPhone and iPad).

Figure 2: Introduction screen.

Figures 3 and 4 show two different scenarios of detecting objects in images. Figure 3 shows a screen of detecting a car and a truck in an image.

Figure 3: Detecting car and truck in an image.

The following are the output of the objects detection:

Array [
Object {
"bbox": Array [
61.6607666015625,
700.927734375,
230.8502197265625,
185.11962890625,
],
"class": "car",
"score": 0.8818359375,
},
Object {
"bbox": Array [
292.78564453125,
651.4892578125,
279.60205078125,
160.94970703125,
],
"class": "truck",
"score": 0.61669921875,
},
]

Figure 4 shows a screen of detecting a person in an image.

Figure 4: Detecting a person in an image.

Figure 5 shows the screen of references.

Figure 5: References screen.

5. Summary

Similarly to [1], in this article, I developed a multi-page mobile application for detecting objects in an image on mobile devices using Expo [2], React JSX [3], React Native [4], TensorFlow.js for React Native [5], and a pre-trained convolutional neural network model COCO-SSD [6].

I verified the mobile application server on Mac and the mobile application clients on iOS mobile devices (both iPhone and iPad).

As demonstrated in [1] and this article, such mobile app can potentially be used as a template for the development of other machine learning and deep learning mobile apps.

The mobile application project files for this article are available in Github [10].

References

  1. Y. Zhang, Deep Learning for Image Classification on Mobile Devices
  2. Expo
  3. React
  4. React Native
  5. TensorFlow.js for React Native
  6. COCO-SSD
  7. J. Huang, et al., Speed/accuracy trade-offs for modern convolutional object detectors
  8. W. Liu, et al., SSD: Single Shot MultiBox Detector
  9. T.Y. Lin, et al., Microsoft COCO: Common Objects in Context
  10. Y. Zhang, Mobile app project files in Github

--

--

Senior Data Scientist at Wavicle Data Solutions, He was a Senior Data Scientist at SMS Assist, a Senior Data Engineer at Capital One, and a DMTS at Motorola