How to improve your kaggle competition leaderboard ranking

Tips from a new ‘Kaggler’ building CNN’s for blindness detection

Published in

Towards Data Science

6 min readSep 8, 2019

After recently competing in the 2019 APTOS Blindness Detection Kaggle Competition and finishing in top 32%, I thought I would share my process for training convolutional neural networks. My only prior deep learning experience was completing the Deeplearning.ai Specialisation, hence this is all you should need to read this article.

Sections to this article

Competition context
Keeping a logbook
Get more data
Leveraging existing kernels
Preprocessing images
Training is a very very slow process (but don’t worry)
Transfer learning
Model selection

Competition context

I spent the last 2–3 months working on and off on the APTOS 2019 Blindness Detection Competition on Kaggle, which required you to grade images of people’s eyes to 5 categories of diabetic retinopathy severity. In the remainder of this article I talk about some tips and tricks you can use in any kaggle vision competition, as I feel that the things I learned from this competition are pretty much universally applicable.

Keep a logbook

Like any good scientific experiment, we change one thing at a time and compare our results to our control. Hence when training a CNN (Convolutional Neural Network) we should do likewise and record the change and the results in a logbook. Heres the one I used from the blindness detection challenge.

My logbook for Kaggle APTOS Blindness Detection Challenge

I don’t claim that the exact table I use here is ideal (far from it), but I found it useful to be able to at least identify each time I made a change whether the model improved or not cumulatively on the previous changes. Regardless I highly recommend you keep some form of logbook as its very difficult to identify if anything your doing is working otherwise.

Some ideas I have for my next competition logbook is to:

Establish a single baseline model to compare all future changes to
Come up with a bunch of tweaks you want to try and run modified versions of the baseline for each tweak independently rather than in a cumulative fashion.
Maintain the same (and smallest) CNN Architecture for as long as possible as it will make iteration quicker and with some look many of the hyper-parameters should transfer decently to larger more complex models.

Get more data

Do some research before you start coding and see if a similar competition has been run before or if there are any databases of similar labelled training sets you can use. More data is never really harmful to your model (assuming the quality of labelling is decent), so get as much of it as you can, but just don’t forget to keep your validation and test sets from the original dataset provided to you or you may end up with a train- test mismatch.

Leveraging existing kernels

If your new to deep learning competitions (like me) you probably don’t want to write your entire notebook from scratch — especially when someone else has probably already posted a starter kernel for your competition (Why reinvent the wheel right?). This will probably save you a bunch of time on debugging and get you onto learning new stuff faster by just tweaking someone else’s model.

This was a good starter kernel that I used and retrofitted for almost all of my further trials.

[APTOS] resnet50 baseline

Download Open Datasets on 1000s of Projects + Share Projects on One Platform. Explore Popular Topics Like Government…

www.kaggle.com

A word of warning: If a kernel suggests a bunch of techniques to use for your model you should check if they state the resultant performance gains, otherwise be skeptical and conduct tests yourself before blindly incorporating them into your own models :)

Preprocessing Images

Cropping & Other Augmentations: This step is a must. Training images may be in a very raw state. For example in the blindness detection challenge the images were all cropped at different ratios which meant a dumb algorithm could overfit to the black space around the eye which was more prevalent in one class than another.

Source: https://www.kaggle.com/taindow/be-careful-what-you-train-on

Hence cropping and resizing images in a robust way was a crucial part of this competition. There were also many image augmentation techniques such as random cropping, rotation, contrast and brightness etc, which I had varying degrees of success with.

Imbalanced classes: Invariably there are more training examples for some classes than others, so you need to fix this before you start training. A combination of techniques that work ok are over / under-sampling as well as mixup (Zhang et al., 2019) during mini batch gradient descent.

Preprocessing Computation: Often the dataset will be quite large and applying rudimentary procedures such as standardising size and cropping of images should be done in a separate kernel t (or offline dep. on the size of the dataset)and re-uploaded as a modified version of the original data — otherwise you will have to do this computation at every epoch / run of your model (which is a terrible waste of time).

Training is a very very slow process

Now that you’ve written your first kernel you need to test it out! Kaggle kernels can run for up to 9 hours (the kernel time limit may vary by competition), the site is also running many models and can be slower at some times of the day than others as a result. My best advice is to first quickly run it in browser for 1 or 2 iterations to make sure you haven’t made any errors then get several ideas you want to test out simultaneously and just hit commit on all of them and check back in a few hours. Note that if you hit commit rather than just running the kernel you don’t have to keep your laptop running :).

Transfer Learning

You won’t be training any model from scratch which is sufficiently large. Typically we will just take a large model pre-trained on imagenet or some other large dataset and fine-tune it for our purposes. In almost all cases you should unfreeze all layers of the model during fine-tuning as the results are likely to be most stable.

This is nicely illustrated by this chart (Yosinski et al. 2014) where two networks are trained on datasets A and B then the network is chopped at layer n and the layers before are either frozen or fine tuned (indicated by +). The conclusion being seen in the second figure with the to line AnB+ with all 7 layers being tuned producing the best top-1 accuracy.

Model selection

Your probably best off starting with a smaller model (like ResNet 50), then trying some larger architectures such as (Resnet-101, InceptionNets, EfficientNets). All of these networks have papers available and are definitely worth a read before you go ahead and use them, typically though you should expect to get better accuracy with newer models than older ones.

Closing Remarks

With the information i’ve provided above you should be able to get a really decent score on both the public and private leaderboards.

My intuition from competing in this challenge would suggest that getting into the top 30 on the public leaderboard is sufficient to have a good chance at finishing in the top 10 on the private board due to the uncertainty associated with the remaining held-out dataset.

In the APTOS challenge the gap between being in the Top 32% and winning 1st place was less than a 3% improvement on my score, so just keep on tweaking your model!

References

Yosinski, J., Clune, J., Bengio, Y. and Lipson, H. (2019). How transferable are features in deep neural networks?. [online] arXiv.org. Available at: https://arxiv.org/abs/1411.1792 .

Zhang, H., Cisse, M., Dauphin, Y. and Lopez-Paz, D. (2019). mixup: Beyond Empirical Risk Minimization. [online] arXiv.org. Available at: https://arxiv.org/abs/1710.09412 [Accessed 8 Sep. 2019].