Smart Prosthetics with Object Detection using Tensorflow

Rohini Sharma
Towards Data Science
5 min readJun 16, 2019

--

During my time at NC State’s Active Robotics Sensing (ARoS) Lab, I had the opportunity to work on a project for smarter control of upper limb prosthesis using computer vision techniques. A prosthetic arm would detect what kind of object it was trying to interact with, and adapt its movements accordingly.

Source: Newcastle University

Similar work has been done at Newcastle University and by the winners of Microsoft’s Imagine Cup. In the case of the Microsoft project, image data was sent to the Azure cloud for object detection and classification using the Azure Custom Vision Service.

In our approach, we wanted to demonstrate how object detection and classification can be done at the edge, embedded on the prosthetic device itself. The system consisted of a prosthetic arm, with an NVIDIA GPU and a USB camera. The camera would send frames of images to the GPU, which would identify the type of object it was, and then send this information to the prosthetic arm. The arm could then move in a way that would allow it to best interact with the identified object.

I used OpenCV and implemented a single shot multi-box detector (SSD) algorithm, trained on the Common Objects in Context (COCO) dataset using Tensorflow. This program was deployed on NVIDIA Jetson TX2 GPU to process the images from a camera attached to the prosthetic arm.

As part of the prototyping, OpenCV was used to implement computer vision techniques like canny edge detection on sample image files. This algorithm would take an image, map out the edges of it, and output a new image with only the edges showing.

An edge can be found by locating where the color of an image changes. In canny edge detection, this is done by finding the intensity gradient of the pixels. Then, to reduce noise, high and low thresholds are set to determine which of the segments are actually edges.

The gradient can be calculated with the following formulas:

More details can be found at https://docs.opencv.org/master/da/d22/tutorial_py_canny.html

This program was then enhanced to read frames from an incoming camera stream.

import numpy as np
import cv2

cap_stream = cv2.VideoCapture(1)
while(cap_stream.isOpened()):
# Take each frame
ret, frame = cap_stream.read()
if ret == True:
edges = cv2.Canny(frame,100,200)
cv2.imshow('edges',edges)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap_stream.release()

Canny edge detection allowed manipulation of the camera feed, but we needed to interpret the incoming images from the feed. A deep learning framework based on Tensorflow was used to perform object detection on the incoming camera feed. The SSD Inception COCO V2, a pre-trained model trained on the Common Objects in Context (COCO) dataset was used.

A single shot detector uses one network to identify regions and classify them, as opposed to separating these two tasks out. This method is preferred for embedded devices, as it is less computationally expensive.

Initial prototyping for object detection was done on a single still image. This was done by loading the pre-trained model, and whenever an object was detected, using OpenCV to draw a bounding box around the object and label it with a name and confidence level.

To perform object detection in real time, the steps are fairly similar, except to receive the input, a camera stream from OpenCV is used rather than reading a file. This is similar to the steps performed real-time canny edge detection. Each frame from the camera stream is then fed into the Tensorflow session, so the objects can be identified. The output is a video feed, with bounding boxes similar to the ones on the still image output shown above.

The performance of this process was slower than expected, taking ≈ 3 seconds per frame to identify objects, causing a lagging feed. In order to make the object detection more efficient, a separate thread was dedicated for handling the I/O of the camera, yielding a faster output (less than 1 second).

Here is a snippet of the Tensorflow code:

video_capture = WebcamVideoStream(src=1).start()     
fps = FPS().start()
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
with detection_graph.as_default():
with tf.Session(config = config,graph=detection_graph) as sess:
while True:
image_np = video_capture.read()
#input and output tensors for detection_graph
image_tensor=detection_graph.get_tensor_by_name("image_tensor:0")
detection_boxes = detection_graph.get_tensor_by_name("detection_boxes:0")
detection_scores = detection_graph.get_tensor_by_name("detection_scores:0")
detection_classes = detection_graph.get_tensor_by_name("detection_classes:0")
num_detections = detection_graph.get_tensor_by_name("num_detections:0")


image_np_expanded = np.expand_dims(image_np,axis=0)
(boxes,scores,classes,num)= sess.run(
[detection_boxes, detection_scores, detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True, line_thickness=8)
cv2.imshow('object detection', cv2.resize(image_np, (800,600)))
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break

This python application was then loaded to a GPU running Linux and tested. The video input to the GPU was provided by a camera connected via USB.

Most of the challenges encountered during this project were regarding the installation and configuration of the GPU with the required libraries to run the object detection application.

The project was able to achieve the desired outcome of being able to demonstrate object detection in near real-time using deep learning on a GPU. This provides the feasibility of embedding it in a prosthetic arm for better adaptability.

Further enhancements can be performed on this project by creating a custom model for more accurate object detection, instead of using the SSD model trained on COCO. Improvements can be made on the video capture by adjusting exposure and brightness to remove outliers in the incoming video frames.

Link to my GitHub project: https://github.com/rohinisharma/AROS-hsproject

--

--