Implementing YOLOv3 in tensorflow python with tensornets

Anirudh S
Towards Data Science
4 min readFeb 24, 2019

--

Spectacular architecture and a splendid view! source:https://pixabay.com/en/architecture-buildings-cars-city-1837176/

From object-detection to Generative Adversarial Networks (GAN), deep learning shows its prowess. Object detection has evolved from the good old manually-engineered feature detectors to the present deep learning-based Convolutional Neural Network (CNN) object detectors such as R-CNN and YOLO. Detection using CNN approximates the object’s location in an image by predicting its bounding box coordinates whereas segmentation goes a step further by predicting the boundaries of objects in the images. In this article, we’ll walk through the steps to run a vehicle-detection network with YOLOv3 trained on MS-COCO dataset that can detect about 90 different classes of objects. With this network, we’ll be able to detect and track cars, buses, trucks, bikes people and many more! To find more interesting AI articles, dive right here.

1.Getting acquainted with tensornets

Downloading the Darknet weights of YOLOv3 and making it run on tensorflow is quite a tedious task. But we are about to do the same in 2 minutes! How do you ask?

Well, Mr Taehoon Lee took the pain of converting various popular networks’ weights into tensorflow’s format and has released a PyPi library called ‘Tensornets’. Tensornets has made it possible to do transfer learning and run inference in just ’10 lines’ of intuitive code.

Check out his Github page: https://github.com/taehoonlee/tensornets

Some of the models available in tensornets

2.Loading YOLO

You Only Look Once source:https://pixabay.com/photos/yolo-sparklers-new-year-1758212/

YOLOv3 is an improved version of YOLOv2 that has greater accuracy and mAP score and that being the main reason for us to choose v3 over v2.

Let’s get rolling.

First, we need to install ‘tensornets’ library and one can easily do that with the handy ‘PIP’ command. ‘pip install tensornets’ will do but one can also install it by pulling it from GitHub. Make sure that you have Tensorflow installed before you start working your magic with the code.

Fire up your favourite IDE and import tensorflow and tensornets. Along with that, we’d need OpenCV and numpy to help with image and video import. We use ‘time’ to monitor the time the network takes to process one frame.

import tensorflow as tf
import tensornets as nets
import cv2
import numpy as np
import time

Once we import the necessary libraries, we go on to create the input placeholder for the network and the model itself.

inputs = tf.placeholder(tf.float32, [None, 416, 416, 3]) 
model = nets.YOLOv3COCO(inputs, nets.Darknet19)

These two lines do the laborious task of loading the weights and the graph, “Just two”.

3. Running inference

Now it’s time to create a tensorflow session and run inference on a video. The below lines define the classes of objects that we want to track and their MS-COCO indices.

classes={'0':'person','1':'bicycle','2':'car','3':'bike','5':'bus','7':'truck'}
list_of_classes=[0,1,2,3,5,7]#to display other detected #objects,change the classes and list of classes to their respective #COCO indices available in their website. Here 0th index is for #people and 1 for bicycle and so on. If you want to detect all the #classes, add the indices to this list
with tf.Session() as sess:
sess.run(model.pretrained())

cap = cv2.VideoCapture("D://pyworks//yolo//videoplayback.mp4")
#change the path to your directory or to '0' for webcam
while(cap.isOpened()):
ret, frame = cap.read()
img=cv2.resize(frame,(416,416))
imge=np.array(img).reshape(-1,416,416,3)
start_time=time.time()
preds = sess.run(model.preds, {inputs: model.preprocess(imge)})

4. The result

Moving on! Once the video frame is fed to the network, it returns the bounding boxes and encapsulates it in the ‘preds’ object. Now we can get the detected classes and their coordinates from the ‘preds’ object.

print("--- %s seconds ---" % (time.time() - start_time)) #to time it
boxes = model.get_boxes(preds, imge.shape[1:3])
cv2.namedWindow('image',cv2.WINDOW_NORMAL)
cv2.resizeWindow('image', 700,700)

boxes1=np.array(boxes)
for j in list_of_classes: #iterate over classes
count =0
if str(j) in classes:
lab=classes[str(j)]
if len(boxes1) !=0:
#iterate over detected vehicles
for i in range(len(boxes1[j])):
box=boxes1[j][i]
#setting confidence threshold as 40%
if boxes1[j][i][4]>=.40:
count += 1

cv2.rectangle(img,(box[0],box[1]),(box[2],box[3]),(0,255,0),3)
cv2.putText(img, lab, (box[0],box[1]), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0, 0, 255), lineType=cv2.LINE_AA)
print(lab,": ",count)

#Display the output
cv2.imshow("image",img)
if cv2.waitKey(1) & 0xFF == ord('q'):
break

This displays the tracked vehicles with the nametags in a new window!

Et voila!

And that’s how it’s done!

The speed of YOLOv3 when it’s run on an Nvidia GTX 1060 6GB gives around 12 fps and it can go up to 30 fps on an Nvidia Titan. With the rise of powerful edge computing devices, YOLO might substitute for Mobilenet and other compact object detection networks that are less accurate than YOLO. Convolutional networks can do more than just object detection. Semantic segmentation, image generation, instance segmentation and more.

Jump here to dive deep into the YOLO paper and understand its elegant architecture and working.

The nitty-gritty details of the YOLO paper interpreted

Machine learning surely is transforming the landscape of the digital world. And almost every industry will be impacted by AI soon.

To explore more into AI, dive into HackerStreak

Find the entire code on GitHub https://github.com/Baakchsu/Vehicle-and-people-tracking-with-YOLOv3-

Have any questions? Shoot at me.

Originally published at https://hackerstreak.com

--

--