Turning a Raspberry Pi 3B+ into a powerful object recognition edge server with Intel Movidius NCS2

We turn the raspberry PI 3B+ into an object recognition server by deploying a MobileNet-SSD architecture for a fully-working solution using the Intel OpenVINO platform.

Alexey Stern
Towards Data Science

--

The Intel NCS2 attached to a Raspberry Pi Model 3B+, the hardware used in this tutorial

In this part, we are going to use a readily compiled neural network in the Intel Neural Compute stick in order for it to be able to receive Base64 encoded images and turn them into bounding-box predictions. Additionally, an example for a front-end that sends camera input to the PI will be provided. Please make sure to also check out the amazing write-up on how to deploy and test models by Mattio Varile.

  • A pre-trained and compiled model will be provided in an attachment.
  • Training and compiling the model for a custom dataset and more details on the front end will be part of another story, so stay tuned!
System diagram visualizing the flow of information via numbers. Click here for the draw.io diagram: https://drive.google.com/file/d/17yTw1YnhjOJh_EjIYLqGVYuYjnB8pKyl/view?usp=sharing

0. Requirements

Update 1, 29.05.2019: system diagram now included,
Update 2, 05.08.2019: pybase64 removed from requirements as not used

0. Requirements

Optional

  • Some USB webcam
  • Some other computer for the front end to run on

1. Preparing the Raspberry PI

1.1. Install the NOOBS image

Flash the NOOBS image on a FAT32 formatted micro SD card. https://www.raspberrypi.org/downloads/

Boot up the USB image normally, set an account password, connect to the internet, etc…

Make sure to also install python3 and pip3 and wget

sudo apt-get update
sudo apt-get install python3-picamera python3-pip wget

1.2. Install the latest Intel OpenVINO software

Download the OpenVINO toolkit

cd ~/Downloads && wget https://download.01.org/opencv/2019/openvinotoolkit/l_openvino_toolkit_raspbi_p_2019.1.094.tgz

I recommend to follow the following guide up until (not including) the section “Build and Run Object Detection Samples”.

After doing everything successfully, you should see the following output when you open up a new terminal:

1.3. Deploying object detection on the neural compute stick

We are going to use a flask server which receives encoded images for prediction. You can find all of the code in the following github repository https://github.com/AlexeyGy/NCS2-server

a) Firstly create a new folder named detection_server in your home folder.

mkdir ~/detection_server && cd detection_server

b) create a requirements.txt file with the following content. This file contains the packages that are needed.

  • flask is the webserver and flask-cors is a wrapper which passes CORS headers (needed for cross-site scripts), more on it here
  • Note that OpenCV (cv2) is not part of this package list as that package is installed together with OpenVINO in step 1.2. This is because OpenVINO provides its own flavor of cv2 including support for CNN architectures.
flask-cors
flask

now run

pip3 install -r requirements.txt

to install the packages automatically.

c) set up the server start script, create a file named RUN.SH

d) Download a pre-trained MobileNet-SSD architecture intermediate representation files which can differentiate between screws and rawl pegs. Stay tuned for the second part of this tutorial where we will cover training.

mkdir models && cd models && wget https://github.com/AlexeyGy/NCS2-server/raw/master/models/no_bn.bin && wget https://github.com/AlexeyGy/NCS2-server/raw/master/models/labelmap.prototxt && wget https://raw.githubusercontent.com/AlexeyGy/NCS2-server/master/models/no_bn.xml

You can see that the architecture contains three files, a labelmap that contains possible labels, a .bin file that contains the frozen network weights and a .xml file that contains the net topology. More info on this can be found here.

Intel IR model. Image courtesy of Intel https://software.intel.com/sites/default/files/managed/ed/e9/inference-engine-700w-300h.png

e) now let’s create the actual server, create the file server.py with the following content. More details on the individual functionalities will be provided below.

  • On line 12, we read the provided model files using the Intel OpenVino cv2.dnn.readNet function
  • line 14 sets a preferable target for our computation to run on
  • lines 17- 19 contain some standard config for our flask server,
  • line 23 uses the flask-cors wrapper to set the CORS header, more info here
  • lines 25–29 are optional, they set a filter for all incoming data which does not consist of images in the right format of jpg or png
  • line 31 sets the default route for our flask server, we accept POST requests which contain an image
  • line 37 allows us to accept a threshold in addition to the image which we pass to the server s.t. all predictions lower than the threshold are not returned
  • line 43 returns a JSON of the prediction results
  • the function in lines 46–50 does the actual image processing, we will get into the according util_mobilnet.py file in a moment. Here is a high level overview on what it does
    — firstly a preprocessing and scaling step is executed which is specific to the Mobilenet-SSD architecture
    — then the network does the inference (lines, 48–49)
    — finally a postprocessing step is executed including the threshold filtering

f) finally, let’s create and look at the util_mobilnet.py file

  • line 5 configures the dimensions which mobilnet requires, as it was trained on 300x300 square images, we set just this as our dimensions
  • the read_labels function reads the labelmap file line by line to define the supported classes
  • the preprocessing function in line 21 handles colors and dimensions of the incoming image, the arbitrarily looking transformations are needed for mobilnet to process the image correctly
  • The postprocess function in line 32 goes over all the predictions and filters out predictions below the threshold, additionally the background prediction is not returned

3. Running the server and setting up auto-start on boot

To minimize resource usage on the raspberry PI, we want to set-up a client server application where a flask server receives pictures and returns the bounding-box predictions.

I found out the best way to autostart flask is to a) add the flask start up to the .bashrc file in the pi home folder and b) to set the system to autostart to cli with autologin.

a) As for adding the lines to the .bashrc file, make sure to set the “Downloads” folder to the folder where you have downloaded your OpenVINO toolkit

b) run the following line in a terminal

sudo raspi-config

you will see the following screens where you should select the options 3 Boot options / 1 Desktop CLI / 2 Console Autologin

select the options as per screenshots

now, after starting, you will see that the Raspberry will start the flask server automatically on port 5000, congratulations!

4. Using a sample GUI to deploy our server

The following steps should be executed on another machine but they can be done on the Raspberry as well if you chose to run the GUI on it (will lead to lower performance).

clone the repository https://github.com/AlexeyGy/NCS2-frontend

git clone https://github.com/AlexeyGy/NCS2-frontend

simply run the RUN.SH file to get a simple python server running. You can access it via localhost:8000, you may have to give it access to your camera before you see any images though.

permission request

The server will post webcam pictures to the address http://192.168.0.2:5000/, if you are on another machine than your raspberry, make sure to customize that to the address of your raspberry.

I will do another post on how the JavaScript frontend ensures that the bounding boxes drawing is only refreshed when it receives a new response from our raspberry server.

The given sample GUI was from a pilot demonstrator at FIR which we used to demonstrate possible use-cases for a smart contract plattform. More on that in a later post.

You can debug the set-up via your browser’s JavaScript console.

5. Conclusions and outlook

Here is a gif of the whole system of front- and backend running together:

Example detection of screws and rawl plugs

The running system is able to deliver 60fps video with about 8 fps on detections if ran on two machines. The user experience is not impacted as the predictions are displayed in an async way by the JavaScript frontend .I have found it actually to be true that too many detection fps lead to a worse user experience as the detections are updated too quickly!

With the onset of edge computing, as other competitors like Google have also entered the race we are up for some exciting future use-cases for neural compute sticks.

Stay tuned for the training and front-end design part.

Resources

--

--

Software Engineer at Google in AMS. Previously at Uber, Spotify and in research at FIR@RWTH Aachen. Opinions are my own.