Training road scene segmentation on Cityscapes with Supervisely, Tensorflow and UNet

Published in

Towards Data Science

8 min readAug 16, 2017

Since 2012, when Alex Krizhevsky has published his ground breaking AlexNet, Deep Learning toolsets made a long way from just a bunch of CUDA C++ files to a great and easy-to-use frameworks like Tensorflow and Caffe, staffed with already implemented powerful architectures like UNet or SSD .

There are also a lot of freely available datasets from different areas, like sign recognition or image captioning.

Now everyone can build and train their very own model. With all that stuff here and now — what could possible go wrong?

Unfortunately, there are a few things you should worry about:

First, you need to obtain interesting dataset. Sometimes, it’s not an easy task, because some links are broken and some datasets require you to send request and wait for a few days (or weeks, or forever).
You gotta be cautious with statistics and distributions. It’s really easy to spend a lot of time training your model with no results if don’t know your data well.
Sometimes you want to annotate something specific, that is not already there on public dataset. For example, you could label car plates with actual license plate number or mark road defects.
More data is better (in Deep Learning). It’s always a good idea to combine a few datasets together to make training easier. Of course, you should first do some pre-processing: for example, in one dataset you have just “Vehicle”, while in another you have “Car”, “Bike” and “Bus”. Re-mapping class labels is a must-do.
Augmentations is also a great idea of how you could make your data bigger. And yes, most frameworks don’t have that feature ready to use.

Here at Deep Systems we come across this kind of difficulties everyday. Preparing data is not the fun part of building a model — but very important, so we created a service that would help to solve all those problems above (and much more!).

In this article we want to introduce you to Supervise.ly and show you an example of a real-world problem: we will train our very own road segmentation model on combination of several publicly available datasets with the help of Tensorflow, Keras and promising architecture called UNet.

Disclaimer: you can skip the Supervise.ly data preparation part and do it on your own, but it would probably take much more time.

Preparing Cityscapes

Cityscapes is a street scenes dataset from 50 cities. It has 5000 high quality annotated frames and 30 classes like “Sidewalk” or “Motorcycle”.

What makes it different from similar datasets is that Cityscapes provides annotations in form of polygons and points, not just large bitmaps.

Step 1: Download dataset

You can obtain dataset here — you will probably have to wait a day or two to get a conformation link: dataset license doesn’t allow distribution.

After you get login, download the following archives: gtFine_trainvaltest.zip (241MB) and leftImg8bit_trainvaltest.zip (11GB).

We only interested in pixel-level subset of Cityscapes dataset, so you don’t have to download other archives.

Now unpack downloaded archives into a folder so you will end with the following structure:

Both gtFine and leftImg8bit contain folders like train and val, and these folders contain folders-cities, like aachen.

As you can see, Cityscapes is a big dataset. If you want, you can complete this tutorial with the whole Cityscapes dataset, but we suggest to leave only a few cities: at least, for the first try. So, in the end, you should have something like this:

$ tree -d
.
├── gtFine
│   ├── train
│   │   ├── aachen
│   │   ├── bochum
│   │   └── bremen
│   └── val
│       └── frankfurt
└── leftImg8bit
    ├── train
    │   ├── aachen
    │   ├── bochum
    │   └── bremen
    └── val
        └── frankfurt

Pack this whole directory into a new archive in one of supported Supervisely import formats : .zip, .tar or .tar.gz. Now we are ready to move forward.

Step 2: Signup to Supervise.ly

Create an account — it takes just a minute and completely free.

Step 3: Import Cityscapes dataset

Great! Right now you have no datasets in Supervisely — it’s time to create your first one. You can upload your own images, but for now we will use Cityscapes. Open “Import” page and select “Open-source dataset format” option. We support several most popular public datasets. Choose “Cityscapes”. At the right side you will see an example of archive you would have to upload — this is the file, we’ve created in the first step.

Enter project name (for this tutorial, enter “cityscapes”) and move to the last step: choose prepared archive on your computer and start uploading.

You can monitor progress at “Tasks status” page.

Step 4: Check out the stats

When you’ll see that the status of task is “Done” it means that everything is OK and you can start doing cool stuff with your images.

First, let’s check out some numbers. Open your newly created projects “cityscapes” and go to “Statistics” page.

Here you will some information about classes distribution, figures area and more. You can get some insights from your data here. For example, one can see, that only 10% images has at least one Bus or that there are more Poles than Cars.

Step 5: Export data

We don’t want to change annotations in dataset, so we can skip right to the export.

One of the most powerful features in Supervise.ly is an export tool. It allows you to set up a pipeline of transformations using simple JSON-based configuration. We support a lot of useful stuff there:

Image crop & rotation
Brightness & contrast correction
Filter out some figures
Re-map class labels
Split dataset to train & validation

and much more.

You can save and transfer those configurations, use them on different projects and, of course — no coding.

To train UNet we would need:

Reduce number of classes from 30 to just a few (for example, four: “bg”, “vehicle”, “road” and “neutral”)
Resize images to fixed-size (for example, 224x224)
Filter out very small figures, (for example, less than 0.15% of area)
Split data to train and validation based on dataset name (basically, by city)
Fill background with “neutral” class
Generate ground-truth bitmaps

We will do all above using Supervise.ly export configuration.

Click on “Export” tab to start new export. On the left side you can define your export configuration using JSON or just add new transformation layers in UI by pressing “Add Layer” button.

Each change will update right side where you would see a graph, representing export transformations.

We made a predefined config for this tutorial. You can obtain it by cloning our repository:

git clone https://github.com/DeepSystems/supervisely-tutorials

Then copy config from unet_training/src/experiment_001/config.json file and paste contents to editor in the left panel.

You will see something similar to the screenshot above. Let me explain what’s going on here:

You can read more about export and layers in documentation.

Click “Start Exporting”, enter a name for this export and Supervise.ly will start preparing your data. You can monitor progress on “Tasks” page.

When export is completed you will see “Done” status. Click the “three-dots” icon and click “Download”.

Train UNet

Now you should have an archive with desirable images. Let’s took a look at some pictures.

Folder “check” contains debug output with original and ground-truth images

It’s time to train our model. In Github repository supervisely-tutorials we already have everything prepared.

Extract downloaded archive in folder unet_training/data/cityscapes. You will end up with something like this:

$ cd unet_training/data
$ tree -d
.
└── cityscapes
    └── city2
        ├── aachen
        │   └── train
        │       ├── check
        │       ├── gt
        │       └── orig
        └── frankfurt
            └── val
                ├── check
                ├── gt
                └── orig

To start training you would need GPU Cuda, Tensorflow and libs. It is possible to resolve all dependencies and install libraries manually or (better idea!) just build & run the docker image:

cd unet_training/docker
./build.sh
./run.sh

Docker will build image packed with Tensorflow and GPU support. It would also mount folders src and data/cityscapes inside container. Now copy link from Jupyter Notebook into your favourite browser.