The world’s leading publication for data science, AI, and ML professionals.

U-Net for Semantic Segmentation on Unbalanced Aerial Imagery

A PyTorch implementation of U-Net for aerial imagery semantic segmentation.

In this article, we review the problem of semantic segmentation on unbalanced binary masks. Focal loss and mIoU are introduced as loss functions to tune the network parameters. Finally, we train the U-Net implemented in PyTorch to perform semantic segmentation on aerial images. The training codes and PyTorch implementations are available through Github.

Dataset

The dataset used here is "Semantic segmentation of aerial imagery" which contains 72 satellite images of Dubai, the UAE, and is segmented into 6 classes. The classes include water, land, road, building, vegetation, and unlabeled.

Fig 1. Sample of dataset
Fig 1. Sample of dataset

U-Net Neural Network

U-Net is a convolutional neural network that originally was presented for biomedical image segmentation at the Computer Science Department of the University of Freiburg. It is based on fully convolutional neural networks and has a modified and extended architecture to work with fewer training images and yield more precise segmentation.

The primary concept is to use a contracting network followed by an expanding network, with upsampling operators in the expansive network replacing pooling operations. These layers increase the resolution of the output. Besides, an expansive convolutional network can learn to assemble a precise output based on encoded information.

Fig 2. Unet structure (image by U-Net)
Fig 2. Unet structure (image by U-Net)

The network consists of a contracting path (left side) and an expansive path (right side), which gives it the u-shaped architecture. The contracting path is a typical convolutional network that consists of repeated convolutions, each followed by a rectified linear unit (ReLU) and a max-pooling operation. During the contraction, the spatial information is reduced while feature information is increased. The expansive pathway combines the features and spatial information through a sequence of up-convolutions and concatenations with high-resolution features from the contracting path.

Loss Function

The purpose of this article is to review the effect of loss function on segmentation output results. Three different loss functions are used in the training procedure. But first, let us have a quick review of them.

Let’s assume p as the output value for each pixel in the image. In this case, we can define the studied loss functions as bellow:

Cross-Entropy Loss

Cross-entropy (or log loss) calculates the logarithm value of the output and because we’re talking about images, it is the logarithm value of every pixel in the output tensor.

The alpha term is a weight hyperparameter for different classes and is a way to balance the loss for unbalanced classes. The final equation for weighted cross-entropy loss is shown in Eq 1.

Eq 1. Weighted cross-entropy loss
Eq 1. Weighted cross-entropy loss

Focal Loss

Focal Loss presents a better solution to the unbalanced dataset problem. It adds an extra term to reduce the impact of correct predictions and focus on incorrect examples. The gamma is a hyperparameter that specifies how powerful this reduction will be.

This loss influences the training of a network on the unbalanced dataset and can improve segmentation results.

Eq 2. Focal loss
Eq 2. Focal loss

IoU Loss (Jaccard index)

And finally, we get to IoU loss which is another option for unbalanced segmentation and has fewer hyperparameters than the others. But first, let’s get familiar with this metric in Eq 3.

Eq 3. Intersection over union
Eq 3. Intersection over union

In the equation, the nominator is the overlap between the predicted and ground-truth masks, and the denominator is the union of them. The IoU is calculated by dividing these two numbers, with values closer to one indicating more accurate predictions.

Fig 3. IoU (image by author)
Fig 3. IoU (image by author)

The purpose of optimization is to maximize the IoU and, it has a value between 0 and 1, so the loss function is defined as:

Eq 3. IoU Loss
Eq 3. IoU Loss

Training and Results

I trained the U-Net with all three loss functions on the mentioned dataset. It’s important to note that there were only 65 images for training and 7 for validation, so we can’t expect great results. But this number of data is enough for our purpose.

Fig 4. Segmentation results using cross-entropy loss (image by author)
Fig 4. Segmentation results using cross-entropy loss (image by author)

As you can see, cross-entropy has a problem segmenting small areas and has the worst performance among these loss functions.

Fig 5. Segmentation results using focal loss (image by author)
Fig 5. Segmentation results using focal loss (image by author)

Focal Loss can achieve better results, especially in small regions, but it still needs some hyperparameter tuning through trial and error.

Fig 6. Segmentation results using IoU loss (image by author)
Fig 6. Segmentation results using IoU loss (image by author)

Finally, we can see that IoU loss also does a great job in segmentation, both for small and large areas.

Here you can see some other outputs:

Fig 6. Cross-entropy on the left, focal loss in the middle, and IoU loss on the right (image by author)
Fig 6. Cross-entropy on the left, focal loss in the middle, and IoU loss on the right (image by author)

Conclusion

In this article, we reviewed the effect of loss function for segmentation on unbalanced images. We trained U-Net neural network to perform semantic segmentation aerial images using 3 different loss functions, cross-entropy loss, focal loss, and IoU loss.

The results demonstrate that cross-entropy loss cannot handle unbalanced datasets. Even adding weight for different classes is not very effective. On the other hand, focal loss and IoU loss both represent better results for unbalanced image segmentation.

You can also refer to the GitHub page to access the project and Pytorch implementations.


Related Articles