Building custom models using Keras (BiSeNet) Part III

In this article, we are going to build a model using Tensorflow-Keras based on a research paper entitled Bilateral Segmentation Network(BiSeNet).

6 min readDec 5, 2018

Updated 10/12/18

This year is coming to an END, 2018 was the year that I had the most amazing Artificial Intelligence(AI) learning journey and I came to realise that Keras is a formidable high-level API for fast Deep Learning(DL) development. It reminds me of LEGOS, you just stack layer on top of layer and if you are a creative person with a wild imagination you can adapt or create custom LEGOS so you can build complex shapes and representations. As an engineer, this means a lot and saves a lot of time — just plug and play.

“Excellence is not a skill, it’s an attitude.” — Ralph Marston

Let the Games Begin

An example of the final result of this tutorial

Bilateral Segmentation Network(BiSeNet)

Semantic segmentation is a fundamental task in computer vision. The main goal of it is to assign semantic labels to each pixel in an image (like in the image above).

In brief, BiSeNet is a state-of-the-art novel approach to Real-time Semantic Segmentation which employs two main novel approaches:

Spatial Path is a method proposed to preserve the spatial size of the original input image and encode affluent spatial information(high-resolution features). The Spatial Path is made of three convolutions layers. Each convolution layer has a stride = 2, followed by batch normalization and ReLU(Rectified Linear Unit).
Context Path is designed to working hand-to-hand with the Spatial path, providing sufficient receptive field. In the semantic segmentation task, receptive field is of great significance for the performance. This was created to enlarge the receptive field with a fast downsampling strategy(using a backbone network like Xception or Res18). Therefore, it demands less computation cost.

Downsampling — To make a digital audio signal smaller by lowering its sampling rate or sample size. In other words, reducing the number of pixels of an image, it is a form of image resampling or image reconstruction.

This two paths came to fix previous approaches used in the semantic segmentation task, that compromise of accuracy over speed.

(For a more detailed information you can later check Part I, Part II, Spatial Path & Context Path)

Now, let’s take a look at the model so we can have a visual perception of how we are going to build it, here is a picture of how it looks:

As you and I can see there are some repeating blocks of code, we are going to wrap those in functions.

First, we need to import some files.

Functions

In Keras you can wrap Keras layers into a function and just pass the arguments to the layers as you would do to any other function. In this function, we have default arguments that are the most common arguments used throughout the model.

Convolution blocks

Now that we created this functions it is much easier to write other parts of the model without repeating code unnecessarily.

Feature Fusion Module & Attention Refinement Module

In pursuit of better accuracy without loss of speed, the authors also researched methods of fusion of the two paths (SP & CP) and refinements of the final prediction, this pursuit culminated in the proposal Feature Fusion Module(FFM) and Attention Refinement Module(ARM) respectively. This further improves the accuracy with acceptable cost.

As we can see in the picture of the model above the ARM is used to refine the features from 16x and 32x blocks of the backbone network(Xception in our case yet you can use any other model of your choice like Res18 for example).

Xception is a novel deep convolutional neural network architecture inspired by Inception created by Google.

We also have FFM that fuses two features path in from different level so we can have rich spatial information from SP and big receptive field of the CP.

(For more detailed information on SP and CP please check out this article Spatial Path & Context Path )

In respect of Context Path(CP), the authors append a Global average pooling layer on the tail of Xception where the receptive field is the maximum of the backbone network.

With that said, we can make our code cleaner by creating a function that deals with the Context Path and ARM calls.

Model

Now we build the model with our LEGO blocks.

Before that, we can make an inbuilt image preprocessing unit using Lambda layers(Pro tip from George Seif’s article 4 Awesome things you can do with Keras..).

Then we create our Spatial Path block in just three lines of code.

We then call Keras Xception model pre-trained on imagenet, I don’t include the top(Fully Connected Layers for classification) because I believe that based on my understanding of the paper it is being used as a feature extractor. The input shape is changed to 224x224x3, not 299x299x3 because the dimensions from SP and CP when sent to FFM module don’t match hence the input would have to be different. Finally, we upscale the output image of the network to its original size in height and width (224x224), print the summary and generate a picture of the model using Keras plot model.

According to Wikipedia, the ImageNet project is a large visual database designed for use in visual object recognition software research. Over 14 million URLs of images have been hand-annotated by ImageNet to indicate what objects are pictured; in at least one million of the images, bounding boxes are also provided

You can find all the code and documentation on my GitHub profile.

Blaizzy/BiSeNet-Implementation

This repo contains the code for a fierce t attempt to implement this amazing Research paper. …

github.com

I have provided you throughout the series with the dataset and preprocessing tools, therefore you can choose a loss function(like categorical_crossentropy), optimizer(like RMsprop) and data augmentation to train this model and get nice results.

Personally, I want to replicate the results from the paper so I’m going to try and do as the authors did on the paper.

The authors of the paper used an Auxiliary Loss function which is basically two softmax functions combined(more information on the paper).

I’m don’t know how to implement the loss function yet. If you know how to do it please give me your contribution to my Github repo and I will make sure to give you a shout-out in my next article.

“I know that I am intelligent because I know that I know nothing.” — Socrates

I count with you!

I would like to say thank you to George Seif for putting up such a fantastic GitHub repo and articles on medium. Thanks to his GitHub repo I managed to finish my model. Amazing work!!!

Thank you for reading. If you have any thoughts, comments or critics please comment down below.

If you like it and relate to it, please give me a round of applause 👏👏 👏(+50) and share it with your friends.

There is so much more coming… I’m going to make a series of topics in relation to the project I’m working on.

Follow me if you want to join me on this adventure on data jungle. :D