Road Surface Semantic Segmentation

Detecting potholes, water-puddles, different types of terrain and more

Published in

Towards Data Science

10 min readAug 11, 2020

Hello There! This post is about a road surface semantic segmentation approach. So the focus here is on the road surface patterns, like: what kind of pavement the vehicle is driving on or if there is any damage on the road, also the road markings and speed-bumps as well and other things that can be relevant for a vehicular navigation task.

Here I will show you the step-by-step approach based on the paper available at Autonomous Robots (Springer) [1]. The Ground Truth and the experiments were made using the RTK dataset [2], with images captured with a low-cost camera, containing images of roads with different types of pavement and different conditions of pavement quality.

It was fun to work on it and I’m excited to share it, I hope you enjoy it too. 🤗

Introduction

The purpose of this approach is to verify the effectiveness of using passive vision (camera) to detect different patterns on the road. For example, to identify if the road surface is an asphalt or cobblestone or an unpaved (dirt) road? This may be relevant for an intelligent vehicle, whether it is an autonomous vehicle or an Advanced Driver-Assistance System (ADAS). Depending on the type of pavement it may be necessary to adapt the way the vehicle is driven, whether for the safety of users or the conservation of the vehicle or even for the comfort of people inside the vehicle.

Another relevant factor of this approach is related to the detection of potholes and water-puddles, which could generate accidents, damage the vehicles and can be quite common in developing countries. This approach can also be useful for departments or organizations responsible for maintaining highways and roads.

To achieve these objectives, Convolutional Neural Networks (CNN) were used for the semantic segmentation of the road surface, I’ll talk more about that in next sections.

Ground Truth

To train the neural network and to test and validate the results, a Ground Truth (GT) was created with 701 images from the RTK dataset. This GT is available on the dataset page and is composed by the following classes:

GT classes [1]

The approach and setup

Everything done here was done using Google Colab. Which is a free Jupyter notebook environment and give us free access to GPUs and is super easy to use, also very helpful for organization and configuration. It was also used the fastai [3], the amazing deep learning library. To be more precise, the step-by-step that I will present was very much based on one of the lessons given by Jeremy Howard on one the courses about deep learning, in this case lesson3-camvid.

The CNN architecture used was the U-NET [4], which is an architecture designed to perform the task of semantic segmentation in medical images, but successfully applied to many other approaches. In addition, ResNet [5] based encoder and a decoder are used. The experiments for this approach were done with resnet34 and resnet50.

For the data augmentation step, standard options from the fastai library were used, with horizontal rotations and perspective distortion being applied. With fastai it is possible to take care to make the same variations made in the data augmentation step for both the original and mask (GT) images.

A relevant point, which was of great importance for the definition of this approach, is that the classes of the GT are quite unbalanced, having much larger pixels of background or surface types (eg.: asphalt, paved or unpaved) than the other classes. Unlike an image classification problem, where perhaps replicating certain images from the dataset could help to balance the classes, in this case, replicating an image would imply further increasing the difference between the number of pixels from the largest to the smallest classes. Then, in the defined approach weights were used in the classes for balancing. 🤔

Based on different experiments, it was realized that just applying the weights is not enough, because when improving the accuracy of the classes that contain a smaller amount of pixels, the classes that contain a larger amount of pixels (eg.: asphalt, paved and unpaved) lost quality in the accuracy results.

The best accuracy values, considering all classes, without losing much quality for the detection of surface types, was with the following configuration: first training a model without using weights, generating a model with good accuracy for the types of surface, then, use that previously trained model as a basis for the next model that uses the proportional weights for the classes. And that’s it!

You can check the complete code, that I will comment on throughout this post, on GitHub:

thiagortk/Road-surface-detection-and-differentiation-considering-surface-damages

The semantic segmentation GT for road surfaces contains 701 frames from RTK dataset.

github.com

Step-by-Step

Are you ready?

gif from https://giphy.com/

Cool, so we start by our initial settings, importing the fastai library and the pathlib module. Let’s call this as Step 1.

Road Surface Semantic Segmentation

Detecting potholes, water-puddles, different types of terrain and more

Introduction

Ground Truth

The approach and setup

thiagortk/Road-surface-detection-and-differentiation-considering-surface-damages

The semantic segmentation GT for road surfaces contains 701 frames from RTK dataset.

Step-by-Step

Step 1 — Initial settings

Step 2 — Preparing the data

Step 3 — First Step — Without weights

Step 4 — Second Part — With weights

Results

Some results samples

Video with the results

Discussion (Let’s talk about it)

Acknowledgements

References

See Also

Road Surface Classification

An approach for road surface type and quality classification

Visual depth estimation by two different sensors

Stereo disparity map and point cloud from Passive and Active vision low-cost sensors

Written by Thiago Rateke