The world’s leading publication for data science, AI, and ML professionals.

How I Tackled My First Kaggle Challenge Using Deep Learning – Part 1

For the past few weeks, I have been taking the free and excellent fast.ai online course, which teaches deep learning from a practical…

Artificial Intelligence
Artificial Intelligence

For the past few weeks, I have been taking the free and excellent fast.ai online course, which teaches Deep Learning from a practical perspective. Coming from a programming background, I found this to be the right approach. I have however been complementing my theoretical understanding with various materials (I highly recommend those notes from the CS231n Stanford Course).

Enter Kaggle

Kaggle is the battle arena and training ground for applied deep learning challenges and I have been drawn to one in particular: the State Farm Distracted Driver Detection challenge. In this challenge we are given a training set of about 20K photos of drivers who are either in a focused or distracted state (e.g. holding phone, putting make up, etc.). The test set consists of around 80K images. The goal is to build a model that can accurately classify a given driver photo among a set of 10 classes, while minimising the log loss (i.e. penalty score goes up every time your classifier gets the wrong prediction, by a log order).

I am interested in the State Farm challenge because I am currently building KamCar, a AI-powered dash cam application for phones that will make driving a safer and richer experience. One thing that KamCar will do is detect driver distraction/drowsiness and alert drivers to avoid catastrophes. Driver distraction is responsible for 20% of car accidents according to the CDC, and I believe that with the current advances in deep learning and increasing power of smartphones, we can do more to tackle this scourge.

Driver in different distracted states
Driver in different distracted states

Before closely following Jeremy Howard‘s (fast.ai co-founder) methodology, I tried my own stupid methods which yielded poor results.

Step 1 – Get your validation set right

As far as I can tell, there is no clear cut rule as to how much of the training set should go in the validation set, so I devise a 80/20 split between training and validation sets.

When I started tackling the State Farm challenge, I just moved 20% of the images across all 10 classes, at random from training to validation set. But as I ran a simple linear model against the data, loss on the training set was huge (over 14), while the validation set accuracy failed to go beyond 17%.

I went back to the State Farm challenge page, read its details again and noticed the following:

The train and test data are split on the drivers, such that one driver can only appear on either train or test set.

Interesting… I thought… Since we validate (and not train) our model against the validation set, it should exhibit similar properties as the test set, right? The solution therefore is to split training and validation set such that a percentage of drivers in the validation set are not in the training set. State Farm conveniently provide a csv file that maps a given driver id to a file name so the split was quite straightforward:

import pandas as pd
import random
df = pd.read_csv(path + '/driver_imgs_list.csv')
by_drivers = df.groupby('subject')
unique_drivers = by_drivers.groups.keys()
# Set validation set percentage with regards to training set
val_pct = 0.2
random.shuffle(unique_drivers)
# These are the drivers we will be entirely moving to the validation set
to_val_drivers = unique_drivers[:int(len(unique_drivers) * val_pct)]

Step 2 – Start with a sample set

Since the training and validation set amount to 20K images, it’s still a bit time consuming to train models on this volume of data when you are just checking whether some settings work.

A sample set is just a subset of training and validation sets: your model will train against the sample training set for quick evaluation of what works and what doesn’t. Doing so enabled me to save a lot of time! I chose my sample set to be around 20% of the data, where images were randomly copied over to the sample set.

Step 3 – Try a barebones model

The model below is as simple as it gets, in fact it has no convolutional layer at all:

def linear_model():
    model = Sequential()
   # image size is 3 (RGB) x 224x224 (WidthxHeight)
   model.add(BatchNormalization(axis=1, input_shape=(3, img_size_1D, img_size_1D)))

    model.add(Flatten())
    # here we have 10 classes        
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

It was interesting to see how it managed to achieve close to 40% accuracy on validation set with no convolution nor regularisation!

model = linear_model()
model.optimizer.lr = 10e-5
model.fit_generator(sample_train_batches, samples_per_epoch=sample_train_batches.nb_sample, nb_epoch=3, 
                   validation_data=sample_val_batches, nb_val_samples=sample_val_batches.nb_sample, verbose=1)
Epoch 1/3
1803/1803 [==============================] - 29s - loss: 5.7358 - acc: 0.2806 - val_loss: 10.9750 - val_acc: 0.1741
Epoch 2/3
1803/1803 [==============================] - 24s - loss: 1.6279 - acc: 0.6339 - val_loss: 4.6160 - val_acc: 0.3304
Epoch 3/3
1803/1803 [==============================] - 24s - loss: 0.5111 - acc: 0.8358 - val_loss: 3.1399 - val_acc: 0.3951

Obviously, you can tell it massively overfits the training set, since in just 3 full runs it’s achieving an accuracy of over 80%, while the validation set accuracy is twice as bad. The problem is that our simple model has learned to memorise the correct weights for most images, which prevents it from generalising well on images of drivers it’s never come across before (and guess what, they are all in the validation set!).

Step 4 – Add some convolution in your life

Example of a convolutional neural network - from Adit Deshpande's blog https://adeshpande3.github.io/
Example of a convolutional neural network – from Adit Deshpande’s blog https://adeshpande3.github.io/

Ok this is where the fun really starts… I created a model model with a few convolutions to test whether accuracy would improve with such an architecture:

def simple_convnet():
    model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Convolution2D(32,3,3, activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D((3,3)),
        Convolution2D(64,3,3, activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D((3,3)),
        Flatten(),
        Dense(200, activation='relu'),
        BatchNormalization(),
        Dense(10, activation='softmax')
    ])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

Surprisingly (for me at least), the model performed quite poorly on the validation set, but reached 100% accuracy on training set pretty quickly:

Epoch 1/3
1803/1803 [==============================] - 34s - loss: 1.2825 - acc: 0.6184 - val_loss: 2.0999 - val_acc: 0.2612
Epoch 2/3
1803/1803 [==============================] - 25s - loss: 0.2360 - acc: 0.9590 - val_loss: 2.2691 - val_acc: 0.2098
Epoch 3/3
1803/1803 [==============================] - 26s - loss: 0.0809 - acc: 0.9939 - val_loss: 2.4817 - val_acc: 0.1808
Epoch 1/3
1803/1803 [==============================] - 30s - loss: 0.0289 - acc: 0.9994 - val_loss: 2.6927 - val_acc: 0.1585
Epoch 2/3
1803/1803 [==============================] - 29s - loss: 0.0160 - acc: 1.0000 - val_loss: 2.7905 - val_acc: 0.1540
Epoch 3/3
1803/1803 [==============================] - 26s - loss: 0.0128 - acc: 1.0000 - val_loss: 2.7741 - val_acc: 0.1562

Step 5 – When there’s a will, there’s an augmentation

Data augmentation enables us to apply random modifications to an image in order to reduce the model’s ability to memorise the weights for that specific image. Some types of augmentations include rotation, shifting of width/height, shear , and RGB channel shifts. I tried a bunch of parameters and settled on the following for best results:

gen_all = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05, shear_range=0.15, channel_shift_range=10, width_shift_range=0.1)

After many runs, I managed to achieve 60% accuracy on the validation set, which very encouraging for things to come:

Epoch 13/15
1803/1803 [==============================] - 26s - loss: 0.4834 - acc: 0.8697 - val_loss: 1.4806 - val_acc: 0.5625
Epoch 14/15
1803/1803 [==============================] - 26s - loss: 0.4944 - acc: 0.8658 - val_loss: 1.4361 - val_acc: 0.5759
Epoch 15/15
1803/1803 [==============================] - 27s - loss: 0.4959 - acc: 0.8597 - val_loss: 1.3884 - val_acc: 0.6004

Not bad for such a simple model, wouldn’t you agree?

Not bad at all, according to Michelle Obama
Not bad at all, according to Michelle Obama

I am working on Part 2 of this series, where we will use the full dataset and perform transfer learning by leveraging on an already pre-trained VGG Net model. Code will soon be available on my github as well.

Stay tuned, share if you like and don’t hesitate to leave a comment :).


_I am a building KamCar, the AI-powered dash cam app to make driving a safer and richer experience. If you are a mobile developer and want to work on some exciting tech and product that has a REAL impact, or just someone who wants to give some advice, hit me up on Twitter or here :)._


Related Articles