Kaggle Planet Competition: How to land in top 4%

Irshad Muhammad
Towards Data Science
8 min readJan 14, 2018

--

In this blog post, we will learn how can we achieve world class result in one of the famous Kaggle competition “Planet: Understanding the Amazon from Space”.

Techniques used in the blog are generic and can be applied to any other problem that requires Object Detection. All of these techniques are taught in fast.ai Deeplearning MOOC. In this tutorial we will be using fastai deeplearning library, which is built on top of PyTorch.

Downloading the competition data

let’s get started, login to Kaggle account, head over to competition page, and accept the rules of competition, go to Data tab and download the following files.

  • sample_submission_v2.csv.zip
  • test-jpg-additional.tar.7z
  • test-jpg.tar.7z
  • train-jpg.tar.7z
  • train_v2.csv.zip

We can download the files using kaggle-cli, it is useful when you are using cloud VM instance such AWS, Paperspace. To download files using kaggle-cli use the following command.

$ kg download -u <username> -p <password> -c planet-understanding-the-amazon-from-space -f <name of file>

where planet-understanding-the-amazon-from-space is name of the competition, you can find the name of competition at end of URL of competition after /c/ part, https://www.kaggle.com/c/planet-understanding-the-amazon-from-space . There is also another great Chrome extension Curl Widget to download data on cloud instances. You can check it out here.

Once the file download is complete, we can extract the files using following commands.

#To extract .7z files
7z x -so <file_name>.7z | tar xf -
#To extract.zip files
unzip <file_name>.zip

once the extraction is complete, move all of files from folder test-jpg-additional to test-jpg.

Our data is ready. Let start building the model.

Initial Model

Note: Only important snippet of code shown in this blogpost, full notebook is available here.

  • If you read the evaluation criteria of the competition, you will know it is based on f2 score. we define metrics for the model accordingly. We will be using pre-trained implementation of Deep Residual model renet34 which was made public by Microsoft.

Get 20% of the available training data as validation data, and load the pre-trained model.

Finding the Learning Rate

Learning rate(LR) is one of the most important hyper parameter of the model. It determines how fast or slow the model will learn.If LR is too high, model will try to learn too fast and loss function will not converge. If LR is very too low the model will take too long to converge.

Finding a good learning rate using fastai library is very easy, just run the following two lines of code. (it helps in finding LR using a techniques introduced in research-paper Cyclical Learning Rates for Training Neural Networks)

This will plot a graph of LR against loss function. A good value for LR will be where the slop of the loss function is highest. As we can see slope is highest around 0.1, you can use any value close to it. It would be a good idea to experiment with a few values around 0.1 to find the optimal value for LR. After experimenting with a few values, 0.2 seemed to work best for me.

Training the model

Total size of chips in competition is 256X256, we start training our model with 64x64 and will gradually increase the size of image as training progress. This is a very good technique to avoid over-fitting.

Output has the following format

[ <epoch_number> <train loss> <val loss> <val set f2 score>]

Let’s try to understated the fit function. When training the model fastai implements a technique called stochastic gradient descent with restarts (SGDR)(paper link). It trains model in cycles where each cycle consists of one or more epochs. For each cycle, it starts with given LR value and will exponentially decrease the LR (Exponential learning rate schedule ) as the training progress. Second parameter in fit denotes the total number of cycles. Total number of epochs in a cycle are controlled by 2 parameter cycle_len and cycle_mult as follows.

epochs in first cycle = cycle_len
epochs in second cycle = epochs in previous(first) cycle x cycle_mult
epochs in third cycle = epochs in previous(second) cycle x cycle_mult

Here is the graph that shows change in LR for each cycle.

Unfreezing all of layers and setting different learning rates for layers

By default the fastai will freeze the weights of all of the layers except a few last layers and the ones that it adds to fine-tune the model for given dataset. (Data scientist with background in Keras will appreciate this, no need for model.pop, model.add, model.layers[index].trainable=False model.compile…)

So in above epochs all of learning is done by those unfrozen last layers.
Next, we will unfreeze the weights of all of the layers to get more accuracy out of our model.

If you give an array of 3 elements to fastai, it will divide the layers into 3 sets, [<initial convolution layers>, <remaining convolution layers>, <last fully connected layers>] . For each set, it will use the corresponding value from array.

In a CNN initial layers learn to find simple features (like an edge, color gradient, corners) and these features will be equally helpful for for planet data set. So, we are using the lowest LR for them. Higher layers in CNN learns to find complex features (like geometrical patterns, faces, specific objects, etc). An increased value of LR for higher layers would help them to adapt more rapidly to given data set.

As you can see from output, our model has started to over-fit. I will stop the training here and increase the training image size to 128x128.

Similarly, train the model by increasing the image size to 256x256. This will finish the training phase of the model.

fastai has another very good feature called Test Time Augmentation (TTA). The idea is simple; apply simple augmentation on each test image to generate five copies of it and then do the prediction for each copy. You can average these prediction to get a significant(1–2%) decrease in error. As you can see in code below, F2 score increased from 0.928 to 0.930 using TTA. This is a good score.

Making first submission to Kaggle

In submission file, we need to place predicted labels against each image. Each image can belong to more than one class.

file_10770,agriculture clear cultivation primary road test_26732,agriculture clear cultivation haze primary

If you look at an example of predictions from our validation set(figure below), you will see our original labels are in the form of 1’s, 0’s, but our predictions are floating point numbers. So, we need to pick a threshold for our predicts to be included in submission files (0.66 would be ideal for below example).

op_th function tries multiple threshold in a given range and returns the one which maximize the F2 score.

Now we have our optimal threshold, let’s generate a submission file

Here are the submission results from Kaggle.

Private score of 0.92997 would have put us at 65th position out of 938. This initial model would have been landed in top 7% . It is quite an achievement. Let’s improve it further.

Private leader board of Kaggle planet competition.

Ensembling

My next goal was to land in top 5%. I trained an ensemble of 5 resnet34 models. Training set of each model consisted of 90% of available training data and remaining 10 % data as validation set. I also applied a slightly different data augmentation to training set for each model. Here are the F2 Score for each model in the ensemble.

f2 Score: 0.9285756738073999                 
f2 Score: 0.9325735931488134
f2 Score: 0.9345646226806884
f2 Score: 0.9331467241762751
f2 Score: 0.9349772800026489

After seeing this, I became very hopeful about improvement in position. I prepared the submission file and submitted it to Kaggle. Here are the results.

Kaggle submission result for ensemble

Here is the private leader board of Kaggle.

A private score of 0.93078 would have put us at 35 position out of 938. That is top 3.7%.

So we have landed in top 4% . GOAL ACHIEVED

code for ensemble can be found here or note book at the bottom.

How to Improve Further

I will leave you with suggestion to improve further.

  • Better way to find threshold: When preparing the submission file, I used a threshold of ~0.2 to select classes for all of test images, but ideally each test image should have a separate threshold depending on the prediction values from model. I experimented with training a ML model to find the better threshold but didn’t succeed. (code is in the note book)
  • Figure out which classes model is predicting incorrectly: From the validation set predictions you can figure out which classes are predicted mostly incorrect by the model. You can introduce multiple slightly augmented copies of those classes in the training set.
  • Use tiff instead of jpg: Size of training data set in tiff format is 12.87 GB while size in jpg is only 600MB for same number of images, so it is obvious tiff data has much more information than jpg. It will surely help to improve model further.
  • Try other architecture: I have only used the resnet34 model, they are many other advance models such as resnet50, resent101. You can experiment with those.
  • Reduce Overfitting: If you look at output of training of ensemble in notebook, you will know some of my models stared to overfit while training. You can experiment with early stopping or dropout to reduce overfitting.
  • Train an ensemble of multiple CNN architectures: While preparing the ensemble, I have only used the ensemble of resnet34. You can prepare a ensemble which includes multiple architectures such as resnet50, resnet101, etc.

If you enjoyed reading this, please clap. :-)

--

--