Road Surface Semantic Segmentation

Detecting potholes, water-puddles, different types of terrain and more

Thiago Rateke
Towards Data Science

--

Hello There! This post is about a road surface semantic segmentation approach. So the focus here is on the road surface patterns, like: what kind of pavement the vehicle is driving on or if there is any damage on the road, also the road markings and speed-bumps as well and other things that can be relevant for a vehicular navigation task.

Here I will show you the step-by-step approach based on the paper available at Autonomous Robots (Springer) [1]. The Ground Truth and the experiments were made using the RTK dataset [2], with images captured with a low-cost camera, containing images of roads with different types of pavement and different conditions of pavement quality.

It was fun to work on it and I’m excited to share it, I hope you enjoy it too. 🤗

Introduction

The purpose of this approach is to verify the effectiveness of using passive vision (camera) to detect different patterns on the road. For example, to identify if the road surface is an asphalt or cobblestone or an unpaved (dirt) road? This may be relevant for an intelligent vehicle, whether it is an autonomous vehicle or an Advanced Driver-Assistance System (ADAS). Depending on the type of pavement it may be necessary to adapt the way the vehicle is driven, whether for the safety of users or the conservation of the vehicle or even for the comfort of people inside the vehicle.

Another relevant factor of this approach is related to the detection of potholes and water-puddles, which could generate accidents, damage the vehicles and can be quite common in developing countries. This approach can also be useful for departments or organizations responsible for maintaining highways and roads.

To achieve these objectives, Convolutional Neural Networks (CNN) were used for the semantic segmentation of the road surface, I’ll talk more about that in next sections.

Ground Truth

To train the neural network and to test and validate the results, a Ground Truth (GT) was created with 701 images from the RTK dataset. This GT is available on the dataset page and is composed by the following classes:

GT classes [1]
GT Samples [1]

The approach and setup

Everything done here was done using Google Colab. Which is a free Jupyter notebook environment and give us free access to GPUs and is super easy to use, also very helpful for organization and configuration. It was also used the fastai [3], the amazing deep learning library. To be more precise, the step-by-step that I will present was very much based on one of the lessons given by Jeremy Howard on one the courses about deep learning, in this case lesson3-camvid.

The CNN architecture used was the U-NET [4], which is an architecture designed to perform the task of semantic segmentation in medical images, but successfully applied to many other approaches. In addition, ResNet [5] based encoder and a decoder are used. The experiments for this approach were done with resnet34 and resnet50.

For the data augmentation step, standard options from the fastai library were used, with horizontal rotations and perspective distortion being applied. With fastai it is possible to take care to make the same variations made in the data augmentation step for both the original and mask (GT) images.

A relevant point, which was of great importance for the definition of this approach, is that the classes of the GT are quite unbalanced, having much larger pixels of background or surface types (eg.: asphalt, paved or unpaved) than the other classes. Unlike an image classification problem, where perhaps replicating certain images from the dataset could help to balance the classes, in this case, replicating an image would imply further increasing the difference between the number of pixels from the largest to the smallest classes. Then, in the defined approach weights were used in the classes for balancing. 🤔

Based on different experiments, it was realized that just applying the weights is not enough, because when improving the accuracy of the classes that contain a smaller amount of pixels, the classes that contain a larger amount of pixels (eg.: asphalt, paved and unpaved) lost quality in the accuracy results.

The best accuracy values, considering all classes, without losing much quality for the detection of surface types, was with the following configuration: first training a model without using weights, generating a model with good accuracy for the types of surface, then, use that previously trained model as a basis for the next model that uses the proportional weights for the classes. And that’s it!

You can check the complete code, that I will comment on throughout this post, on GitHub:

Step-by-Step

Are you ready?

gif from https://giphy.com/

Cool, so we start by our initial settings, importing the fastai library and the pathlib module. Let’s call this as Step 1.

Step 1 — Initial settings

Road Surface Semantic Segmentation.ipynb

As we’ll use our dataset from google drive, we need to mount it, so in the next cell type:

Road Surface Semantic Segmentation.ipynb

You’ll see something like the next image, click on the link and you’ll get an authorization code, so just copy and paste the authorization code in the expected field.

From author

Now just access your Google Drive as a file system. This is the start of Step 2, loading our data.

Step 2 — Preparing the data

Road Surface Semantic Segmentation.ipynb

Where “image” is the folder containing the original images. The “labels” is the folder containing the masks that we’ll use for our training and validation, these images are 8-bit pixels after a colormap removal process. In “colorLabels” I’ve put the original colored masks, which we can use later for visual comparison. The “valid.txt” file contains a list of images names randomly selected for validation. Finally, the “codes.txt” file contains a list with classes names.

From author
Road Surface Semantic Segmentation.ipynb
From author

Now, we define the paths for the original images and for the GT mask images, enabling access to all images in each folder to be used later.

Road Surface Semantic Segmentation.ipynb

We can see an example, image 139 from the dataset.

From author

Next, as shown in fastai lesson, we use a function to infer the mask filename from the original image, responsible for the color coding of each pixel.

Road Surface Semantic Segmentation.ipynb

Step 3 — First Step — Without weights

Here we are at the Step 3. Let’s create the DataBunch for training our first model using data block API. Defining where our images come from, which images will be used for validation and and the masks corresponding to each original image. For the data augmentation, the fastai library also gives options, but here we’ll use only the default options with get_transforms(), which consists of randomly horizontal rotations and the perspective warping. Remember to set tfm_y=True in the transform call to ensure that the transformations for the data augmentation in the dataset are the same for each mask and its original image. Imagine if we rotated the original image, but the mask corresponding to that image was not rotated, what a mess it would be! 😵

Road Surface Semantic Segmentation.ipynb

We continue using the lesson3-camvid example from the fastai course, to define the accuracy metric and the weight decay. I’ve used the resnet34 model since I didn’t have much of a difference using resnet50 in this approach with this dataset. We can find the learning rate using lr_find(learn), which in my case I’ve defined as 1e-4.

Road Surface Semantic Segmentation.ipynb
From author

Next we run the fit_one_cycle() for 10 times to check how our model is doing.

Road Surface Semantic Segmentation.ipynb
From author

Using the confusion matrix we can see how good (or bad) the model is for each class until now…

Road Surface Semantic Segmentation.ipynb
From author

Don’t forget to save the model we’ve trained until now.

Road Surface Semantic Segmentation.ipynb

Now we just train the model over more epochs to improve the learning, and remember to save our final model. The slice keyword is used to take a start and a stop value, so in the first layers begin the training with the start value and this will change until the stop value when reaching the end of the training process.

Road Surface Semantic Segmentation.ipynb
From author

This is our first model, without weights, which works fine for road surfaces but doesn’t work for the small classes.

From author

Step 4 — Second Part — With weights

We’ll use the first model in our next Step. This part is almost exactly the same as the Step 3, since the databunch, we just need to remember to load our previous model.

Road Surface Semantic Segmentation.ipynb

And, before we start the training process, we need to put weight in the classes. I defined these weights in order to try to be proportional to how much each class appears in the dataset (number of pixels). * I ran a python code with OpenCV just to count the number of pixels in each class over the GT’s 701 images, to get a sense of the proportion of each class… 😓

Road Surface Semantic Segmentation.ipynb

The remainder is exactly like step three presented before. What changes are the results obtained. 😬

From author

Now, it looks like we have a more reasonable result for all classes. Remember to save it!

Road Surface Semantic Segmentation.ipynb

Results

Finally, let’s see our images, right? Before anything, it will be better to save our results, or our test images.

Road Surface Semantic Segmentation.ipynb

But, wait! The images all look completely black, where are my results??? 😱 Calm down, these are the results, just without color map, if you open one of these images on the entire screen, with high brightness, you can see the small variations, “Eleven Shades of Grey” 🙃. So let’s color our results to be more presentable? Now we’ll use OpenCV and create a new folder to save our colored results.

Road Surface Semantic Segmentation.ipynb

So we create a function to identify each variation and to colorize each pixel.

Road Surface Semantic Segmentation.ipynb

Next, we read each image, call the function and save our final result.

Road Surface Semantic Segmentation.ipynb

But, this process could take more time than necessary, using the %timeit we achieve a performance as:

From author

Imagine if we need to test with more images? We can speed up this step using Cython. So, let’s put a pinch of Cython on that!

gif from https://giphy.com/

So, we edit our function to identify each variation and to colorize each pixel, but this time, using Cython.

Road Surface Semantic Segmentation.ipynb

And we just read each image and call the function and save our final result as we did before.

Road Surface Semantic Segmentation.ipynb

And voila! Now we have a performance as:

From author

Much better, right?

Some results samples

In the image below are some results. In the left column are the original images, in the middle column the GT and in the right column the result with this approach.

Adapted from [1]

Video with the results

Discussion (Let’s talk about it)

Identifying road surface conditions is important in any scenario, based on this the vehicle or driver can adapt and make a decision that can make the driving safer, more comfortable and more efficient. This is particularly relevant in developing countries that may have even more situations of road maintenance problems or a reasonable number of unpaved roads.

This approach looks promising for dealing with environments with variations in the road surface. This can also be useful for highway analysis and maintenance departments, in order to automate part of their work in assessing road quality and identifying where maintenance is needed.

However, some points were identified and analyzed as subject to improvement.

For the segmentation GT, it may be interesting to divide some classes into more specific classes, such as the Cracks class, used for different damages regardless of the type of road. Thus having variations of Cracks for each type of surface, because different surfaces have different types of damage. Also divide this class into different classes, categorizing different damage in each new class.

That’s all for now. Feel free to reach out to me. 🤘

Acknowledgements

This experiment is part of a project on visual perception for vehicular navigation from LAPiX (Image Processing and Computer Graphics Lab).

If you are going to talk about this approach, please cite as:

@article{rateke:2020_3,
author = {Thiago Rateke and Aldo von Wangenheim},
title = {Road surface detection and differentiation considering surface damages},
year = {2020},
eprint = {2006.13377},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
}

References

[1] T. Rateke, A. von Wangenheim. Road surface detection and differentiation considering surface damages, (2020), Autonomous Robots (Springer).

[2] T. Rateke, K. A. Justen and A. von Wangenheim. Road Surface Classification with Images Captured From Low-cost Cameras — Road Traversing Knowledge (RTK) Dataset, (2019), Revista de Informática Teórica e Aplicada (RITA).

[3] J. Howard, et al. fastai (2018). https://github.com/fastai/fastai.

[4] O. Ronneberger, P. Fischer, T. Brox. U-net: Convolutional networks for biomedical image segmentation, (2015), NAVAB, N. et al. (Ed.). Medical Image Computing and Computer-Assisted Intervention — MICCAI 2015. Cham: Springer International Publishing.

[5] K. He, et al. Deep residual learning for image recognition, (2016), IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

--

--