Velocious Vehicles and How to Track Them

The Road to becoming a self driving car engineer: Part 1

Published in

Towards Data Science

7 min readDec 18, 2017

A view of NVIDIA’s self driving car technology

Since post graduation I’ve made it a goal to explore topics that I haven’t had the chance to explore during class. I’ve made it a goal to read two articles a day about the latest in AI technologies (i.e. machine learning, deep learning, AI uses in gaming, the list goes on). I’ve been reading articles about OpenCV and self driving cars, and was inspired to practice OpenCV this past month on some side projects. Right now self driving cars are HOT! There are probably at least 15 other posts about how enthusiasts have made them. Still, I’m excited and honored to be the nth person to share my attempt. I came across Udacity’s Self Driving Car Nano-degree program after writing a short script vehicle detection script — I realized there was more I could do to improve the algorithm…

Basic vehicle detection code

I’m not enrolled in the program, so I’m completely on my own here. I still thought it would be fun to implement my own solutions with what I learned from my own research!

Data and Algorithms

The goal of self driving cars is to create vehicles that can pilot themselves in any road condition possible, especially when surrounded by high speed drivers. Researchers concluded that the following algorithm yields fast and optimal results using an SVM classifier:

Perform a Histogram of Oriented Gradients feature extraction by breaking up the images into regions and train the classifier with local objects using light gradients or edge detection.
Perform Color Space Conversion and Space binning for optimal image processing with pictures of all resolutions and sizes.
Implement a sliding window algorithm to search for vehicles in images with the trained classifier.
Estimate and draw the final bounding box for detected vehicles.

The vehicle dataset used was from the GTI vehicle image database, and the road test video and images were from the KITTI vision benchmark suite. Udacity also released a labeled dataset as well which might be used for later components of this project.

Data Prep

I started by organizing the images. I made lists grouping all the tagged images, and then additional lists for for the training data to avoid overfitting. This was just to validate the data was organized correctly.

Side by side comparison between a vehicle and the road

Side by side comparison between a vehicle and an aerial image

Feature Extraction

The idea of a Histogram of Oriented Gradients (HOG) is that we can detect local objects by light intensity gradients or edge directions of distribution are described. By breaking up an image into smaller regions (cells), each cell generates a histogram of oriented gradients or pixel cell edge direction histogram of these combinations can be expressed. To improve accuracy, the local histograms can be evaluated in a larger area (known as a block) of the image comparison of light intensity as the measure was standardized, and then using this value (measure) the normalized all the cells in the block.

But, what kinds of features are useful for this particular classification task? Let’s discuss this point using an example. For example suppose we want to build an object detector that detects buttons on shirts and coats. A button is circular and usually has a few holes for sewing. You can run an edge detector on the image of a button, and easily tell if it’s a button by simply looking at the edge image alone. In this case, edge information is “useful” and color information is not. In addition, the features also need to have discriminative power. For example, good features extracted from an image should be able to tell the difference between buttons and other circular objects like coins and car tires. This is where the light intensity as described above can be useful.

In the HOG feature descriptor, the distribution ( histograms ) of directions of gradients ( oriented gradients ) are used as features. Gradients of an image are useful because the magnitude of gradients is large around edges and corners ( regions of abrupt intensity changes ) and we know that edges and corners pack in a lot more information about object shape than flat regions. So HOG descriptor is especially suited for human detection.

Notice the vehicle and road edges detected in the HOG images. These same features are highlighted in the additional images below:

Road and Vehicle HOG with lower resolution pictures

Road and Vehicle features with lower resolution pictures

Training a Classifier

I used a Linear Support Vector Machine (SVM) for the classifier, which are a set of supervised learning methods for classification, regression and outliers detection. This was a suitable classifier since we had labeled data to work with.

I stored the extracted features from the images of vehicles and non vehicles and used those features for the training data and test data. I used helper functions from scikit-learn to split the data for training.

The classifier has achieved 0.9817 accuracy. Now we can start testing!

Visualizations

A sliding window is a box applied on a photo. It has a specific height and width, and this box slides across an image from top to bottom. For each position the classifier we trained is applied on that region, and looks for the desired object.

Sliding window example looking for a face, courtesy of pyimagesearch.com

This is a common technique in computer vision since we’ll be able to detect objects at different scales and locations, especially if this were to work on the road.

Bounding boxes drawn with a Sliding Window algorithm. It’s clear that there are some false positives

The classifier returned positive detections in the six test images above, but there are some false positives on the left side of the road. A suggested alternative was to try using a heat map. The idea behind heat maps is that data and patterns can be effectively visualized by assigning colors to values rather than simply looking at the images.

Heat Maps

For each pixel discovered in a bounding box, heat is added and highlights pixels making up a detected vehicle. The mapping and thresholding algorithm counts the number of times a car’s detected. Lighter hues in the map indicate repeated detected pixels per bounding box. Therefore we add more heat. The heat maps are definitely easier to decipher, especially with false positives while the bounding boxes are more obtrusive:

Results

I used imageio and moviepy to smooth out the frames of the test video before I applied the detection algorithm. Each detected object is stored in a queue. Every time we detect a vehicle on the current or later frames in the clip, we check if we’ve detected similar object in past frames. If we did, we append the detected object, and increase the object’s count across multiple frames. The gif below shows a sample of a successful vehicle detection program:

Vehicle detection demo

Conclusions

When I first attempted this project I only used a pre trained vehicle cascade. I learned so much from this latest attempt and was able to produce a demo with less false positives. It was interesting using conventional machine learning techniques, so I would love to implement a solution using deep learning (PyTorch is my forté but compelled to try Tensorflow’s object detection API after reading this post) I won’t be using Udacity’s exact curriculum, but I’m excited to continue building this project. Stay tuned for more work on future components!