6 Obstacles to Robust Object Detection

How robust is your detector?

Sabina Pokhrel
Towards Data Science

--

Can your object detector detect people and horses in the following image?

Photo by Paul Chambers on Unsplash

What if the same image is rotated by 90 degrees? Can it detect people and horses?

Photo by Paul Chambers on Unsplash

Or a cat in these images?

Left most (Photo by Erik-Jan Leusink on Unsplash), Middle (Photo by Krista Mangulsone on Unsplash), Right most (Photo by Ludemeula Fernandes on Unsplash)

We have come a long way in the advancement of computer vision. Object detection algorithms using AI have outperformed humans in certain tasks. But why is it that it is still a challenge to detect a person if the image is rotated 90 degree, a cat if it lying in an uncommon position or an object if only part of it is visible.

A lot of models that have been created for object detection and classification since AlexNet in 2012, and they are getting better in terms of accuracy and efficiency. However, most of of the models are trained and tested in ideal scenarios. But in reality, the scenario where these models are used are not always ideal: the background may be cluttered, the object may be deformation, or maybe occluded. Take an example of the images of cat below. Any object detector trained to detect cat will, without failure, detect the cat in the image on the left. But for the image on the right, most detectors may fail to detect the cat.

Left (Photo by Edgar Edgar on Unsplash), Right (Photo by Krista Mangulsone on Unsplash)

Tasks that are considered trivial for humans is certainly a challenge in computer vision. It is easy for us humans to identify a person, regardless of the image in any orientation or a cat in different poses, or a cup viewed from any angle.

Let’s take a look at 6 such obstacles to detecting objects robustly.

1. Viewpoint variation

An object viewed from different angles may look completely different. Taking a simple example of a cup (referring to images below), the first image, showing top view of a cup with black coffee looks completely different from the second image with side and top view of the cup with cappuccino, and third image with side view of the cup.

Cups from different view points. Left (Photo by Jack Carter on Unsplash), Middle (Photo by Pablo Merchán Montes on Unsplash), Right (Photo by NordWood Themes on Unsplash)

This is one of the challenges with object detection because most detectors are trained with images only from a particular viewpoint.

2. Deformation

Many objects of interest are not rigid bodies and can be deformed in extreme ways. As an example, let’s look at images below of yogis in different positions. If the object detector is trained to detect a person with training that only included person sitting, standing, or walking, it might not be able to detect people in these images, as the features in these images may not match the ones that it learned about people during training.

Left (Photo by Avi Richards on Unsplash), Right (Photo by CATHY PHAM on Unsplash)
Left (Photo by GMB Monkey on Unsplash), Right (Photo by Form on Unsplash)

3. Occlusion

The objects of interest can be occluded. Sometimes only a small portion of an object, as little as few pixels could be visible.

Women holding cup (Photo by Alisa Anton on Unsplash)

For example, in the above image, object (cup) is occluded by the person’s handing holding the cup. When we see only part of an object, in most cases, we can instantly identify what it is. Object detectors, however, do not perform the same.

Another example of occlusion is images where a person is holding a mobile phone. It is a challenge to detect mobile phones in these images:

People holding their phones. Left (Photo by Meghan Schiereck on Unsplash), Middle (Photo by William Iven on Unsplash), Right (Photo by Priscilla Du Preez on Unsplash)

4. Illumination conditions

The effects of illumination are drastic on the pixel level. Objects exhibit different colors under different illumination conditions. For example, an outdoor surveillance camera is exposed to different lighting conditions throughout the day, bright daylight, evening, and night light. An image of a pedestrian looks different in these varying illuminations. This affects the capability of the detector to detect objects robustly.

Alley in different lighting conditions. Left (Photo by Chromatograph on Unsplash), Middle (Photo by Omid Armin on Unsplash), Right (Photo by Sravan V on Unsplash)

5. Cluttered or textured Background

The objects of interest may blend into the background, making them hard to identify. For example, cat and dog in the images below are camouflaged with the rug they are sitting/lying on. In these cases, object detector will face challenges detecting the cats and dogs.

Left (Source), Middle (Source), Right (Source)
Left (Source), Middle (Source), Right (Source)

6. Intra-class variation

An object of interest can often be relatively broad, such as a house. There are many different types of these objects, each with their own appearance. All the images below are of different types of houses.

Houses. Left (Photo by Jesse Roberts on Unsplash), Middle (Photo by Ralph Kayden on Unsplash), Right (Photo by David Veksler on Unsplash)
Houses. Left (Photo by Ian Keefe on Unsplash), Middle (Photo by Pixasquare on Unsplash), Right (Photo by Stephan Bechert on Unsplash)

A good detector must be robust enough to detect cross product of all these variations, while also maintaining sensitivity to the inter-class variations.

Solutions

For creating a robust object detector, ensure that there is good variation on training data, for different viewpoints, illumination conditions, and objects in different backgrounds. If you cannot find real world training data with all the variations, use data augmentation techniques to synthesize the data you need.

What techniques have you used to make your object detector robust? Leave your thoughts as comments below.

Originally published in www.xailient.com/blog.

Looking to implement real-time face detection on a Raspberry Pi? Check out this post.

About the author

Sabina Pokhrel works at Xailient, a computer-vision start-up that has built the world’s fastest Edge-optimized object detector.

References:

Pahuja, A., Majumder, A., Chakraborty, A., & Venkatesh Babu, R. (2019). Enhancing Salient Object Segmentation Through Attention. arXiv preprint arXiv:1905.11522.

Maier, W., Eschey, M., & Steinbach, E. (2011, September). Image-based object detection under varying illumination in environments with specular surfaces. In 2011 18th IEEE International Conference on Image Processing (pp. 1389–1392). IEEE.

Cai, Y., Du, D., Zhang, L., Wen, L., Wang, W., Wu, Y., & Lyu, S. (2019). Guided Attention Network for Object Detection and Counting on Drones. arXiv preprint arXiv:1909.11307.

Hsiao, E., & Hebert, M. (2014). Occlusion reasoning for object detectionunder arbitrary viewpoint. IEEE transactions on pattern analysis and machine intelligence, 36(9), 1803–1815.

--

--

AI Specialist | Machine Learning Engineer | Writer and former Editorial Associate at Towards Data Science