How to Build and Deploy a Custom Object Detector & Classifier using TensorFlow Object Detection API?

Learn how to build a traffic light detector and classifier that is used in programming a real self-driving car.

Sandeep Aswathnarayana

Published in

Towards Data Science

10 min readJun 21, 2020

Traffic Light Classifier: Site (left) and Simulator (right)

Motive:

Implement a traffic light classifier using TensorFlow Object Detection API — This can be used to detect, with bounding boxes, objects in images and/or video using either some of the pre-trained models made available or through models you can train on your own.

Application:

Programming a real Self-Driving Car. An attempt to solve the problem of Vision & Perception in autonomous vehicles.

Code & Implementation:

Please find all the necessary files, code, modules, and results pertinent to this blog post on my GitHub repository.

This traffic light classifier is a pivotal part of Udacity’s Self-Driving Car Engineer Nanodegree Capstone. I spearheaded a team of 4 (collaborating w/ Arun Sagar, Malith P. Ranaweera, and Shuang Li), each member coming from a diverse background and experience, in programming a real self-driving car. Please find this project repository on GitHub.

Why choose TensorFlow Object Detection API?

TensorFlow’s Object Detection API is a powerful tool that makes it easy to construct, train, and deploy object detection models. In most of the cases, training an entire convolutional network from scratch is time-consuming and requires large datasets. This problem can be solved by using the advantage of transfer learning with a pre-trained model using the TensorFlow API.

Building a Classifier:

Getting started with the installation: Find the instructions on TensorFlow Object Detection API repository and go to the path: tensorflow/models/object_detection/g3doc/installation.md.

Clone or download the TensorFlow Models repository. Navigate to that directory in your terminal/cmd.exe
Go to https://github.com/protocolbuffers/protobuf/releases/tag/v3.4.0 and download protoc-3.4.0-win32.zip (choose an appropriate one based on your OS and requirements)
Extract the two downloaded files from the consecutive steps above. Now, from within the models (or models-master) directory, you can use the protoc command:

On Windows:

"C:/Program Files/protoc/bin/protoc"object_detection/protos/*.proto --python_out=.

On Ubuntu:

protoc object_detection/protos/*.proto --python_out=.export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Run the jupyter notebook object_detection_tutorial.ipynb. This downloads a pre-trained model for you. The pre-trained model here is COCO (common objects in context). In the notebook, comment out the get_ipython().magic('matplotlib inline') line.
Next, bring in the Python Open CV wrapper usingimport cv2
The actual detection process takes place in the ‘for’ loop (in the last cell), which we need to modify based on our needs accordingly. There’s certainly some more cleaning of the code that we could do, like getting rid of the Matplotlib imports, feel free to clean things up if you like.
Loading in your custom images: In the jupyter notebook, make the necessary imports to load your images from a directory, modify the notebook to meet your needs, and run it.

How to build a traffic light detection model using TensorFlow Object Detection API?

Add your objects of interest to the pre-trained model or use that model’s weights to give yourself a head start on training these new objects. The Tensorflow Object Detection API is basically a tradeoff between accuracy and speed. Considering the dataset I was working on, I chose to use faster_rcnn_inception_v2_coco. Please see the full list of available models under the TensorFlow detection model zoo.

Steps:

Collect around 500 (or more if you choose to) custom traffic light images. For this project, I chose images from — the Udacity Simulator, the Udacity’s CARLA Test Site, and the Web
Annotate the custom images using ‘labelImg’
Split them into train-test sets
Generate a TFRecord for the train-test split
Setup a config file
Train the actual model
Export the graph from the newly trained model
Bring in the frozen_inference_graph to classify traffic lights in real-time

Step 1: Collecting custom traffic light images

Simulator: Run the simulator with the camera on and follow the instructions to collect the traffic light images from the simulator

A few sample images collected from the Udacity Simulator

Site: Download the Carla site’s traffic light images from the ROS bag provided by Udacity

A few sample images collected from the Udacity’s CARLA Site

Step 2: Annotate the custom traffic light images

Hand label the traffic light dataset images by using ‘labelImg’.
Steps:

Clone the labelImg repository
Follow along the installation steps meeting your python version requirements. But, for Python3 on Ubuntu:

sudo apt-get install pyqt5-dev-toolssudo pip3 install lxmlmake qt5py3python3 labelImg.py

Run python labelImg.py
Open the dir where you have saved all your traffic light images
For every image, create a RectBox around the traffic lights you want to detect and add the labels accordingly (RED, GREEN, YELLOW in our case)
Save them in the directory of your choice. Follow these steps to custom label all the images

The ‘labelImg GUI’ where the sample traffic light is annotated as ‘Red’

Step 3: Train-test split

Do a 90–10 split: Add the annotated images and their matching XML annotation files to two separate folders named ‘train’ (90% of the images) and ‘test’ (10% of the images).

Step 4: Generating TFRecords for the train-test split

We need some helper code from Dat Tran’s raccoon_dataset repository from GitHub. We just need 2 scripts from this repo: xml_to_csv.py and generate_tfrecord.py.

(1) xml_to_csv.py:

Make the necessary modifications in the main function of this file. This will iterate through the train and test to create those separate CSVs and then from these CSVs, we create the TFRecord

Within the xml_to_csv.py script, I replaced:

def main():
    image_path = os.path.join(os.getcwd(), 'annotations')
    xml_df = xml_to_csv(image_path)
    xml_df.to_csv('raccoon_labels.csv', index=None)
    print('Successfully converted xml to csv.')

With:

def main():
    for directory in ['train','test']:
        image_path = os.path.join(os.getcwd(), 'images/{}'.format(directory))
        xml_df = xml_to_csv(image_path)
        xml_df.to_csv('data/{}_labels.csv'.format(directory), index=None)
        print('Successfully converted xml to csv.')

Run the python3 xml_to_csv.py command. Now, you have the CSV files ready: train_labels.csv and test_labels.csv

(2) generate_tfrecord.py:

Next, we need to grab the TFRecord: Go to datitran/raccoon_dataset/blob/master/generate_tfrecord.py
The only modification that you will need to make here is in the class_text_to_int function. You need to change this to your specific class. In this case, add three ‘row_label’ values each for RED, GREEN, and YELLOW in the if-elif-else statement

# TO-DO replace this with label map
def class_text_to_int(row_label):
    if row_label == 'Red':
        return 1
    elif row_label == 'Yellow':
        return 2
    elif row_label == 'Green':
        return 3
    else:
        None

Remember to use the same id values

%%writefile training/labelmap.pbtxt
item {
  id: 1
  name: 'Red'
}
item {
  id: 2
  name: 'Yellow'
}
item {
  id: 3
  name: 'Green'
}

Make sure you have installed Object Detection (installation.md on GitHub). Run the two commands each for train and test present in the ‘Usage’ section of generate_tfrecord.py.

For the train TFRecord:

python3 generate_tfrecord.py --csv_input=data/train_labels.csv --output_path=data/train.record --image_dir=images/

For the test TFRecord:

python3 generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=data/test.record --image_dir=images/

Now we have the train and test record files ready. The reason we need the TFRecord files is to convert from anything that will generate data (say, PASCAL VOC format) to TFRecord so we could use them with the Object Detection API.

Step 5: Setup a configuration file

To train our model, we need to set up a config file (along with TFRecord and a pre-trained model). Please find all the relevant files, installation information, pre-trained models, and more on Tensorflow Object Detection API.

Steps:

Go to https://github.com/tensorflow/models/tree/master/research/object_detection/samples/configs on GitHub
Download the configuration file (faster_rcnn_inception_v2_coco.config) and the checkpoint (.tar.gz file) for the faster_rcnn_inception_v2_coco model from the TF model detection zoo
Run the two commands, each to download the config file and the Faster R-CNN model (also extract the downloaded model). Put the config file in the training directory, and extract the faster_rcnn_inception_v2_coco in the models/object_detection directory
Modify the config file to meet your requirements including, but not limited to, PATH_TO_BE_CONFIGURED, num_classes, batch_size, checkpoint name, the path to fine_tune_checkpoint, label_map_path: “training/object-detect.pbtxt”
Add the label map file with item and id values each for RED, YELLOW, and GREEN (as shown in Step 4)

Step 6: Train the actual model

From within models/object_detection , run your model using the python command while including the path to save the model, pipeline for the config file

python3 train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/faster_rcnn_inception_v2_coco.config

At this point, barring errors, you should see the model summary with the steps and their corresponding loss values as shown below

INFO:tensorflow:global step 11788: loss = 0.6717 (0.398 sec/step)
INFO:tensorflow:global step 11789: loss = 0.5310 (0.436 sec/step)
INFO:tensorflow:global step 11790: loss = 0.6614 (0.405 sec/step)
INFO:tensorflow:global step 11791: loss = 0.7758 (0.460 sec/step)
INFO:tensorflow:global step 11792: loss = 0.7164 (0.378 sec/step)
INFO:tensorflow:global step 11793: loss = 0.8096 (0.393 sec/step)

Your steps start at 1 and the loss will be much higher. Depending on your GPU and how much training data you have, this process will take varying amounts of time. You want to shoot for a loss of about ~1 on average (or lower).

You could load up Tensorboard to visualize the values including loss, accuracy, steps and training time
Now, you have the trained model ready. Next, load the model via checkpoint

Faster R-CNN Model Architecture:
Faster R-CNN was originally published in NIPS 2015. Its architecture is complex because it has several moving parts.

Here’s a high-level overview of the model. It all starts with an image, from which we want to obtain:

a list of bounding boxes
a label assigned to each bounding box
a probability for each label and bounding box

Featured Image Credit: Atakan Körez and Necaattin Barışçı

A blog by Javier Rey has a pretty good explanation of how Object Detection works on Faster R-CNN which can be found on Tryolabs.
For a quick overview of the Faster R-CNN and its Region Proposal Network, please refer to Hao Gao’s blog post on Medium.

NOTE:

After experimenting with different models including SSD Inception V2, Faster R-CNN, and Nvidia’s Convolutional Neural Network, we eventually decided to go with Faster R-CNN after finding its performance to be compelling for our traffic light dataset. At the end of the day, choosing an appropriate model is a trade-off between ‘accuracy’ and ‘speed’ to meet your requirements.
Please find the State-of-the-Art Models in Object Detection on Papers With Code. I decided to choose Faster R-CNN based on its performance meeting the requirements for ‘accuracy’ as opposed to ‘speed’ in the object detection process. Moreover, TensorFlow hasn’t kept abreast of the SOTA models by adding them to its TensorFlow Object Detection API repository.

Step 7: Export the graph from the newly trained model

In this step, we are going to test our model and see if it does what we had hoped. In order to do this, we need to export the inference graph. In the models/object_detection directory, there is a script that does this for us: export_inference_graph.py
Go to the ‘Protobuf Compilation’ section in installation.md (https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md) and export the path following the given instructions. So, this loads up TensorFlow, and then makes the graph and saves it
Use this to do the object detection using the notebook object_detection_tutorial.ipynb that came in with the API

Step 8: Bring in the frozen_inference_graph to classify the traffic lights in real-time

Modify the export_inference_graph.py to meet your requirements. To run this, you just need to pass in your checkpoint and your pipeline config, then wherever you want the inference graph to be placed. For example:

python3 export_inference_graph.py \
    --input_type image_tensor \
    --pipeline_config_path training/faster_rcnn_inception_v2_coco.config \
    --trained_checkpoint_prefix training/model.ckpt-10856 \
    --output_directory traffic_lights_inference_graph

Run the installation command to export the inference graph (https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md). Now, you have the frozen_inference_graph.pb and checkpoint files ready
Your checkpoint files should be in the training directory. Just look for the one with the largest step (the largest number after the dash), and that's the one you want to use. Next, make sure the pipeline_config_path is set to whatever config file you chose, and then finally choose the name for the output directory, say, traffic_lights_inference_graph

Run the above command from models/object_detection. If you get an error about no module named 'nets', then you need to re-run:

# From tensorflow/models/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
# switch back to object_detection after this and re run the above command

Open the object_detection_tutorial.ipynb notebook. Make the necessary modifications in the notebook including, but not limited to, MODEL_NAME, PATH_TO_CKPT, PATH_TO_LABELS, NUM_CLASSES, TEST_IMAGE_PATHS. Run the notebook to see your traffic lights with bounding boxes and their prediction accuracies

Results:

Images from the simulator classified with bounding boxes and prediction accuracies

Images from the site classified with bounding boxes and prediction accuracies

Limitations:

Given the scope of this project, I have used only a few images from the web along with the ones from the simulator and the CARLA site. And, the model hasn’t been tested on a new, unknown site.
The model’s performance in unique lighting and weather conditions is unknown as most of the images used here were captured on a typical bright, sunny day.
With perpetual research in the field, there are other State-of-the-Art object detection models which might have relatively better performance accuracies.

Acknowledgment:

I am grateful to Sebastian Thrun and David Stavens (Founders, Udacity), David Silver, Ryan Keenan, Cezanne Camacho, for assistance with course material and instructions, and Guest Speakers from official partners including NVIDIA AI, UberATG, Mercedes-Benz, BMW, McLaren, DiDi who provided expertise that greatly assisted the research.

Should you have any inputs that you want to share with me (or) provide me with any relevant feedback on my writing or thoughts, I’d be glad to hear from you. Please feel free to connect with me on Twitter, LinkedIn, or follow me on GitHub.