A live AI web-server with Intel Neural Compute Stick and RaspberryPi

Published in

Towards Data Science

3 min readMar 5, 2021

Applications of Artificial Intelligence are endless. I gather a RaspberryPi4 and an Intel Neural Compute Stick® and played with them. A Raspberry is an on-board computer with infinities capabilities, but it is not enough powerful to run complex neural networks. Here it come the NCS enabling new possibilities. When connected together they are a powerful instrument able to load complex neural network architectures in a small form-factor and with low energy consumption.

I combined them together to build a AI-enabled web server. With any device connected to the home network it is possible to watch a live streaming from a webcam and have intelligent prediction based on the images shown.

The setup

Given a webcam, a RaspberryPi and a NCS, they can be assembled together to build a web-server with incorporated AI. The hardware connections are trivial, one has only to take care to attach the NCS to the Raspberry’s USB3.0 port to obtain better performances.

The software

This project is made of three different components and each of them is characterised by a specific tool.

We need to capture images using the webcam. In this case, OpenCV will be useful;
When an image is captured using the OpenVINO framework Machine learning operations are performed to obtain a classification of what the webcam has seen.
Finally the image and the classification are streamed using Flask.

Shot an image

Shooting an image is quite simple in Python using openCV.

cap = cv2.VideoCapture(0) # open a connection to webcam
ret, frame = cap.read() # shot

One has to create a connection to the webcam and then just read data from it.

Machine learning

Given an image the tricky part, the raspberry has to ask the Neural Compute Stick to infer a label and get back the result.

NCS can load pre-trained models, for instance the inception-v4 network. The model with the weight can be obtained from the openVINO website.

Moreover the label of each class in outpu should be downloaded: imagenet_classe_index.json

def load_model(self):
        model_xml = self.model
        model_bin = os.path.splitext(model_xml)[0] + ".bin"

        # Plugin initialization for specified device
        ie = IECore()
        # Read IR
        self.net = ie.read_network(model=model_xml, weights=model_bin)

        assert len(self.net.inputs.keys()) == 1, "Sample supports only single input topologies"
        assert len(self.net.outputs) == 1, "Sample supports only single output topologies"

        self.input_blob = next(iter(self.net.inputs))
        self.out_blob = next(iter(self.net.outputs))
        self.net.batch_size = len(self.input)

        # Loading model to the plugin
        self.exec_net = ie.load_network(network=self.net, device_name=self.device)

        #load classes
        with open("/home/pi/inception/imagenet_class_index.json",'r') as file:
            self.labels_map=json.load(file)

Infer what is in the image

Once the model and the labels are loaded into the Neural Compute Stick device and an image is shoot, it is simple inference what it is in front of the webcam.

res = self.exec_net.infer(inputs={self.input_blob: images})

Server

So, at this point we have an image and a vector res with information about what the webcam is looking at.

We can stream this trough a web-server. Using Flask it is quite simple.

def generator():
    global model
    cap = cv2.VideoCapture(0)
    while True:
        ret, frame = cap.read()
        frame = model.process(frame)
        encode_return_code, image_buffer = cv2.imencode('.jpg', frame)
        io_buf = io.BytesIO(image_buffer)
        yield (b'--frame\r\n'+
               b'Content-Type: image/jpeg\r\n\r\n' + io_buf.read() + b'\r\n')

@app.route("/video")
def video():
    return Response(generator(),mimetype='multipart/x-mixed-replace; boundary=frame')

This function generator yields continuously new frames, so it is possible to stream videos.

Note the frame = model.process_frame(frame) which sends the frame to the Compute Stick and sends it back it with the labels.

Connect to the newly created server

Using any device in the network and pointing a browser to http://raspberrypi.local we can see the whole system in action.

The systems correctly identifies a bottle. Image by author.

Now to the Moon

At this point the limit is the Sky. This can be used as home surveillance system, as the eye of a robot, as a portable object recogniser and so on..

The code

Want to have fun with this? The code is on GitHub at https://github.com/fvalle1/ai_server