Recognizing Cassava Leaf Diseases Using Transfer Learning

If you are like me, then most of the time you can’t remember what you dreamed. This morning, I remembered everything. I dreamed I got an email that announced I had won the Cassava Leaf Disease competition. Besides that, my status on the Kaggle website showed I had become a Kaggle Grandmaster.
Kaggle is an online community of data scientists and machine learning practitioners. Depending on the number of competitions you won, Kaggle assigns you a certain ranking.
When I woke up, still a bit confused, I logged in to the Kaggle website. I was still a Kaggle newbie. My place on the public leaderboard of the Cassava Leaf Disease competition was 2343. So far for dreams coming true…
I just started with the Cassava Leaf Disease competition after I passed the Developer Certification exam. I wanted to see if I could improve my Deep Learning skills by joining this competition.
In a previous article, I used Transfer Learning to create a model for the Dogs vs. Cats competition. Building this model was fun, but it had no immediate usage in the real-world.
It differs from this Kaggle competition I joined. Kaggle named this competition Cassava Leaf Disease Classification. The goal is to identify the type of disease present on a Cassava Leaf from an image.
Cassava is a key food in Africa, grown by small farmers. They grow Cassava because it withstands harsh conditions. But viral diseases are a major source of poor yields.
To fight these diseases, farmers need help from expensive agricultural experts. These experts first diagnose the condition. Then together with the farmer, they create a plan to control the disease.
We could help farmers by creating a solution that could detect the disease. For example, using pictures taken with their mobile camera. This solution should be able to work with low-quality images and low-bandwidth.
A solution using image recognition with Deep Learning would be the ideal candidate.
In this article, we will develop a model using TensorFlow 2.4.1 and Python 3.8. We will start with a Convolutional Neural Network to get us on the leaderboard. Then we will improve our model using the DenseNet201 and EfficientNetB3 pre-trained models. Finally, we will use Test Time Augmentation (TTA) to improve our predictions.
If you are interested in the code, see this Github repository.
Training data
The training data consists of 21.397 jpeg images of Cassava leaves. Experts collected the photos during regular surveys in Uganda. The format realistically represents what farmers would need to diagnose in real life.
The images from the competition are divided into the following five categories.
- 0 → a leaf that shows Cassava Bacterial Blight (CBB)
- 1 → a leaf that shows Cassava Brown Streak Disease (CBSD)
- 2 → a leaf that shows Cassava Green Mottle (CGM)
- 3 → a leaf that shows Cassava Mosaic Disease (CMD)
- 4 → a healthy leaf

Organizing the training data
Kaggle stored all the training data in a single folder. Each image’s name represents its id. The mapping between the id and the label is stored in a separate CSV. Below you can see a sample from the CSV file.
We need to restructure the images because I want to use the ImageDataGenerator
class. We also need to reserve a part of the training images for validation. You can see the folder structure that the ImageDataGenerator
needs below.
We have to read the CSV and copy the images into the correct folder. In addition, we have to divide the pictures between the train and validation set. The function distribute_images
accepts a single parameter. This parameter defines the ratio between the train and validation images. A value of 0.7 selects 70% of the images for training and 30% for validation.
We read the CSV on line 4 and then iterate through the labels using the unique()
method of the train_df
data frame.
Creating and Training the model
Instead of using Transfer Learning as we did with the Cats vs. Dogs competition, we will start with a small Convolution Neural Network. Although TL will probably get us higher on the leaderboard, let’s get the first score on the board to make sure that everything is working.
Creating the ImageDataGenerators
The ImageDataGenerators are used to send the training and validation images to the deep learning pipeline. The ImageDataGenerator for serving the training data uses Image Augmentation to increase the variety of the images. The ImageDataGenerator for the validation data only rescales the images.
The difference Cats vs. Dogs competition is that the class_mode is categorical as we have more than two output labels.
Constructing and Compiling the CNN model
The first model that we are going to train is a small Convolutional Neural Network. The model uses four convolution layers, a Dropout layer, and a hidden Dense layer of 512 units.
To be able to predict the five different classes, the output layer has five units.
Training the model
We set to train for 50 epochs, although the used callbacks will probably stop earlier.
Did you see that we set the callbacks parameter using the create_callbacks()
method? The create_callbacks()
method returns an array of three different callbacks. See below. Each of these callbacks gets called by TensorFlow after every epoch.
The EarlyStopping
callback stops training if the validation accuracy does not improve anymore. With patience, we set the number of epochs to wait before stopping when TensorFlow sees no progress.
The ReduceLROnPlateau
callback automatically reduces the learning rate if it doesn’t see any improvement.
ModelCheckpoint
automatically saves the model after an epoch if the validation loss is less than a previous epoch. Sometimes, the model at the end of the training is less accurate than the model one or two epochs before. If you use this callback, it will always save the best one.
Visualizing the training result
On my machine, this model achieves a maximum validation accuracy of 0.71 and a validation loss of 0.79.
The fit method on the model returns a history object. TensorFlow saves the result of each epoch in this history object. We can use the Matplotlib library to visualize the training.
We see that the training and validation accuracy follow each other. The validation accuracy is fluctuating a bit, but that is ok.

The same is true for the training and validation loss. An excellent first baseline that we can submit to the competition.

Submitting the model to get a score
Now, here is where this competition differs from previous ones. Instead of submitting your predictions, you submit a notebook.
This notebook should create and train your model and create the predictions. Kaggle then runs the notebook in an isolated environment. In this environment, the notebook can access all the test images.
Extra restrictions are that the notebook cannot access the internet. Also, the notebook must finish training and predicting in less than 9 hours.
Creating and saving the predictions
During training, we used ModelCheckPoint
to save the best performing model. To create a prediction we load thìs model using load_model
. Then to serve the images to the model, we use the ImageDataGenerator. We only resize the images.
The last step is to save the predictions into a file called submission.csv
using pandas. The format of this file is prescribed by Kaggle. Kaggle will read the file and use it to score your submission.
I had some trouble getting Kaggle to pick up the submission file. I found out that I had misplaced the file in the wrong folder.
Result
One hour after submitting the notebook, Kaggle showed the score, which was 0.703 as expected.

The score put me on the 2996 place on the public leaderboard – a good start for our first try.

If you are wondering what the number one score on the public leaderboard is? At this moment, it is 0.911, so we still got a long way to go 🙂

Improving our model using Transfer Learning
Previously, with the cats vs. dogs competition, we improved our model using Transfer Learning. We will try to do the same here.
However, there is one additional complexity. If you use one of the default Keras applications, it downloads the weights file from the internet. In this competition, our notebook cannot access the internet, so that won’t work. We will see how to work around this limitation later. Let’s first try to increase the accuracy of our model.
The only thing that we will change is the architecture of our model. The create_cnn_model
function looks likes this.
We create an instance of DenseNet201
and use it as the first part of our model. We then Flatten
the result and add a single Dense
layer with 512 units. The output layer is still the same Dense
layer with five units. One for each category.
When we train the model, the accuracy increases for 23 epochs. Training and validation accuracy follow each other nicely, which shows a healthy model. The same is true for the training and validation loss. The validation accuracy climbs to around 0.76 while the validation loss flattens out at 0.70.


When we submit this model to Kaggle, it scores 0.772, which moves us up to the 2927 place on the public leaderboard. An excellent step upwards.


Submitting the model to Kaggle
When you want to submit a notebook that uses transfer learning, you have to split your notebook. The first part trains and saves the model. This notebook can access the internet to retrieve the weights. The result of this is the saved model. You create a Kaggle dataset with this model.
The second part is the notebook that you submit. This notebook loads the saved model from the Kaggle dataset. This model is then used to create the predictions.
Optimize using EfficientNet, a different pre-trained model
Currently, one of the best performing models is EfficientNet. EfficientNet, hence the name, is one of the most efficient models. It requires the least amount of computing power for inference. The graph below shows a comparison of EfficientNet against other pre-trained models.

There are eight different variants of EfficientNet, B0 to B7. Each variant is more accurate and uses more parameters, see the graph above.
We use the EfficientB3 model, a good trade-off between accuracy and the number of parameters.
We create an instance of EfficientNetB3
and use it as the first part of our model. We then use a GlobalAveragePooling2D
layer to resize the result and add a single Dense
layer with 256 units. A Dropout
layer prevents overfitting. The output layer is still the same Dense
layer with five units – one for each category.
Another thing we changed to improve our model is increasing the size of our input image. Previously we used 150×150, but for EffientNet we use 512×512.
When we train the model, the accuracy increases for 25 epochs. Training and validation accuracy follow each other nicely, which shows a healthy model. The same is true for the training and validation loss. The validation accuracy climbs to around 0.87 while the validation loss flattens out at 0.33.


After submitting this model, we moved to place 2343 on the public leaderboard. Another significant step forwards.


Optimizing using Test Time Augmentation (TTA)
The ImageDataGenerator
we use to feed images into our network during training randomly transforms the photos to increase the number and variations of training images. When we create predictions, we also use an ImageDataGenerator
but don’t augment the images. What happens if we change this?
Test Time Augmentation (TTA) is a technique to increase the accuracy of your predictions. We don’t use the test image directly anymore. First, we transform the picture with different transformations such as rotating cropping, flipping, enhancing contrast, etc. We then get a prediction of each of these altered images.

We then average the predictions. Which will deliver us the prediction of the image. Most of the time this will improve the accuracy of your predictions.
Implementing TTA in our notebook
As we already are using an ImageDataGenerator to serve the test images to the model to predict, we only have to add the augmentation options to the generator and iterate multiple times through the test set.
Starting at row sixteen, we iterate ten times through all test images and save the predictions. We calculate the average of the predictions in row 33.
I reused the model that scored 0.880. By using TTA the accuracy of the predictions got up to 0.889, as you can see below.


Conclusion and further optimization
We started with a simple Convolutional Network that gave us an accuracy of 71%. The accuracy improved to 88% after we switched to using Transfer Learning using DenseNet201 and EfficientNet. By using Test Time Augmentation, we almost got it up another percent to 88.9%. The source code can be found in this Github repository.
If we compare our score to the top of the leaderboard, which is 91.1%, there is still a difference of 2.2%. By reading through the notebooks that the Kaggle community shared, I saw some additional techniques to improve the accuracy. I will mention them below and explain them briefly.
Use a bigger and more accurate pre-trained model such as EfficientNetB7. Bigger means that you will need a larger GPU or TPU with more memory, and training will take a lot longer.
Use K-Fold Cross-Validation. **** This is a different method for splitting your data into a training and validation set. You divide the collection of images into K parts of equal size. You then train K models, each time with a different training and validation set. To create a prediction, you combine the predictions of each model.
Use CutMix for image augmentation. It is a new strategy that cuts and pastes random patches between training images. It also adjusts the label of the picture. This way, it forces the model to focus on less discriminative parts of the cassava leaf.
Use Mixup for image augmentation. With Mixup, you combine two images and their labels using linear interpolations. Research showed that Mixup improves the generalization of neural network architecture.
Multi-model predictions or ensemble models combine the predictions from multiple models to improve the overall performance.
Most of these things are new to me. But as I love to learn, I will try these techniques to improve my model’s accuracy. I will let you know in the next article how it went.
Thank you for reading, and remember never to stop learning!
Used resources
Mingxing Tan and Quoc V Le. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of International Conference on Machine Learning (ICML), 2019.
Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. Densely connected convolutional networks. arXiv preprint arXiv:1608.06993, 2016a.
A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012