Approaching AIcrowd’s LNDST problem in under 50 lines of code!

Published in

Towards Data Science

5 min readSep 6, 2020

In this article, I’ll be illustrating how to approach a core computer vision problem known as semantic segmentation. Simply put, semantic segmentation’s goal is to simply classify each pixel in a given image to a particular class according to what is shown in the image.

LNDST is a classic example of semantic segmentation which can be solved using CNNs. The Landsat dataset consists of 400x400 RGB satellite images that have been taken from the Landsat 8 satellite. In each image, there can be water and background. Our classifier should predict each pixel as 0 - background or 1 - water. The metric for ranking is the F1/dice score.

We’ll be using FastAI v1 to approach this problem. FastAI is a popular wrapper that works in tandem with the PyTorch framework. I chose FastAI for solving this problem since it provides several features like learning rate finder, data loaders which can be created with a couple of lines of code, and several other goodies. Make sure you have downloaded the dataset and extracted it to a folder named data. Let’s start!

from fastai.vision import *

The above line imports FastAI’s vision module.

path = Path('data')
path_img = path/'train_images'
path_lbl = path/'train_gt'img_names = get_image_files(path_img)
lbl_names = get_image_files(path_lbl)

img_names and lbl_names are lists containing the training images and their respective masks.

# Batch Size
bs = 8# Labels
labels = ['background', 'water']# Mapping fuction mapping x names and y names
def get_y_fn(x):
    dest = x.name.split('.')[0] + '.png'\
    return path_lbl/destsrc = (SegmentationItemList.from_folder(path_img)
        # Load in x data from folder
        .split_by_rand_pct()
        # Split data into training and validation set
        .label_from_func(get_y_fn, classes=labels)
        # Label data using the get_y_fn function
)# Define our image augmentations
tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)data = (src.transform(tfms, size=400, tfm_y=True)
        # Augments the images and the mask
        .databunch(bs=bs)
        # Create a databunch
        .normalize(imagenet_stats)
        # Normalize for imagenet mean and std
)

The above code creates an ImageDataBunch object which deals with all the aspects of handling data like preprocessing, augmentations, splitting into training and validation sets, and so on. Let us now take a look at a mini-batch of our data.

data.show_batch(8, figsize=(10,10))

This is a random mini-batch after applying random transformations like rotation, flipping, etc.

Now that our data is ready, let us create a model and train it. There are several architectures that can be used to solve a segmentation task like U-Net, FPN, DeepLabV3, PSPNet. We’ll be using U-Net in this article.

# Pretrained Encoder
encoder = models.resnet34learn = unet_learner(data, encoder, metrics=dice)

FastAI’s unet_learner method creates an instance of the Learner class. Learner class handles the complete training loop and printing the specified metrics. This method particularly constructs a U-Net like architecture with the given encoder and loads the imagenet pretrained weights only for the encoder part. If you are unsure about how U-Nets work, check out this paper. Note that we are passing dice as a metric which will give us an idea of how our model might perform on the test set.

learn.lr_find()
learn.recorder.plot()

The graph that’s plotted gives us an idea of what the optimal learning rate might be. According to FastAI, the optimal learning rate would be the steepest downward slope in the graph where the loss slides down fast to the minima. In this case, it can be anywhere around 1e-5.

# Fit the model
learn.fit_one_cycle(10, 1e-5)

Now, we run 10 epochs with a maximum learning rate of 1e-5. FastAI uses one cycle policy for learning rate scheduling which was mentioned in this paper. This initial training updates the parameters of only the decoder retaining the weights of the pre-trained encoder. Once our model is trained well enough, we can unfreeze the encoder as well and train some more.

# Unfreeze and train some more
learn.unfreeze()
learn.fit_one_cycle(10, slice(1e-6, 1e-5))

Now, we train for some more epochs with discriminative learning rates where the earlier layers are trained with a lower maximum learning rate, and the learning rates are increased for the subsequent layer groups. Now that our model is trained, we will visually inspect if the model works fine.

learn.show_results(rows=3, figsize=(10,10))

Now that everything is set, we can run inference on the test set and make a submission!

from glob import glob
lst = sorted(glob('.data/test_images/*') , key=lambda x: int(x.split('_')[-1].split('.')[0]))main_array = []for i in lst:
    # Open image
    img = open_image(i)
    mask = learn.predict(img)[0]    # Convert torch tensor to numpy array
    mask = mask.data.numpy()    # Flatten the array
    mask = mask.flatten()
    main_array.append(mask)main_array = np.asarray(main_array)
main_array_flat = np.reshape(main_array,(-1)).astype(np.uint8)with open('submission.npy', 'wb') as f:
    np.save(f,main_array_flat)

The above code runs inference on the test set and creates a submission file that can be submitted to the AIcrowd website.

Conclusion

This is definitely not the complete solution that might get you the best result on the leaderboard. But this is definitely a great starting point upon which one could develop their solution. A few improvements to the above solution could be doing cross-validation, ensembling, test time augmentation, checking out different loss functions other than the cross-entropy loss, and so on. I’d like to conclude by congratulating AIcrowd’s team for creating this wonderful platform and conducting this awesome and short spanned blitz competition which definitely encourages beginners to step into the world of machine learning.

Approaching AIcrowd’s LNDST problem in under 50 lines of code!

Conclusion

Written by Ashwin