Hands-on Tutorials
Written by Clementine Pages, Jean Joanbon and selva cleber – March 2021

1. Context
This project was built as part of the validation of our Data Scientist Bootcamp courses at DataScientest and put into practice everything that we have learnt during these 11 weeks of theoretical classes and ensure that every topic has been mastered.
The goal of this project is to localize and classify the species of a plant from a picture. Once the classification is done, return a description of the plant and identify an eventual disease.
2. Project
Dataset – Iteration 1
The Dataset New Plant Diseases Dataset is from Kaggle, and was used thoroughly by many users and doesn’t contain any errors or wrongly classified images.
This Dataset is composed of leaves laid on a uniform background:

And contains around 87,000 images of leaves – diseased and healthy. Those pictures represent 14 plants over 38 classes:

The distribution is as per below:

This dataset has been recreated using offline augmentation from another dataset, thus as we can see in the prior graph the classes are well balanced (with around 3% for each class – external layer).
To test our model on images of leaves in a vegetal environment, we implemented a web scraping script to extract images from "Google image". This test dataset is similarly composed of 38 classes plus another class "Others" representing images without plants.
The images from the test set are not following the same format (one leaf per image) compared to the original Dataset.
Deep Learning – Iteration 1
As a first iteration, we develop our own CNN model with TensorFlow:

This model is composed of three convolutional blocks and one classification block.
Convolutional block layers:
- Image size: Images are resized to 128x128x3 which is a good compromise between time computing and loss of information.
- Conv2D: Use filters to extract patterns from images. In order to capture a larger combination of patterns, the number of filters is doubled for each Conv2D layer compared to the previous one.
- Activation function: Apply non-linearity with Relu which is the most commonly used for CNN.
- BatchNormalization (BN): BN is used to normalize the output of the previous layers. It makes CNN faster, more stable and decreases overfitting.
- MaxPool2D: Reduce the dimensionality of images by reducing the number of pixels, it reduces the number of parameters of the model and provides a slight scale invariance to the internal representation.
Classification block layers:
- Dense: Three fully connected layers used for classification.
- Dropout: Prevent overfitting by randomly ignoring selected neurons during training.
Layer between Conv blocks and classification block:
- GlobalAveragePooling2D (GAP): It does an average of values of each feature Maps. Unlike the "flatten" layer, "GAP" removes a large number of trainable parameters, provides a translation invariance and thus reduces the tendency of overfitting.

Regularization: In order to prevent overfitting, we used L2 regularization which adds the sum of squared parameters (weights) to the loss function.
Model code is presented hereafter:
This model gives accuracy on the validation dataset of 99.3%.
However, it gives very bad predictions on the test dataset with an accuracy of 13.2%.
Summary of iteration 1:
- Dataset: "new-plant-diseases-dataset" which are leaves images on a uniform background
- Excellent prediction on the Validation Dataset – val accuracy > 99%.
- Bad predictions on the Test Dataset composed of leaves images in a natural environment.
- Model is predicting plants even on images without any plants on it.
The main reason for bad predictions of leaves images in a natural environment is that our model is trained with a dataset of leaves images on a uniform background. Moreover, as in the training dataset, no images are without plants, it is impossible for our model to predict that no plant is present in the images if it happens.

Deep Learning – Iteration 2
In a second iteration, we attempted to improve our models by adding a vegetal background to the leaves images. We modified our dataset and used Plant Village dataset from Kaggle which is similar to our first Dataset (leaves images on a uniform background), but without any data augmentation. As a result, the dataset is highly unbalanced (more than 5000 images from some classes while less than 200 images from other classes), but contains segmented leaves images we could use to add vegetal backgrounds.

To rebalanced this dataset, a data augmentation was performed following the transformations describe hereafter:

In order to detect the absence of plants in the images, we add another class "Others" to our dataset using random images from the "Image-net" dataset.

After data augmentation and adding the new class "Others", this dataset contains 114 077 images distributed as follow:

We used a "custom Dataset" to load pictures and add a background replacing black pixels of segmented pictures with our background pixels.
Replacing pixels is done with the function "tf.where" which replaces the pixel of the leaf image with one of the background images if the value of the leave image pixel is below a defined threshold.
Note that for the second iteration we used only ONE background per plant type. For instance, the same tomato plants background is used for any tomato classes.

The code is presented below:
We used the same model to compare the result between both iterations, except the last Dense layer has an output shape of 39 (38 plants + Others).
This model gives accuracy on the new validation dataset of 95.1%.
However, it gives very bad predictions on the test dataset with an accuracy of 10.5%.
Summary of iteration 2:
- Dataset "Plant Village" with data augmentation and adding new classes from dataset "Image-Net".
- Adding backgrounds during the model training (only one background per plant type).
- Good **** predictions on images related to the Dataset.
- Slight improvement in the predictions on the test dataset, but the performance is still lower than expected, especially with images where the leaves cannot be seen completely.
- The model seems to be able to predict if the image is not a plant.
- The model seems to focus on the background instead of the leaf itself.
During iteration 2, our model learnt to use the background instead of the leaves, so we did another iteration "2-bis" using exactly the same model and approach, but instead of having one sole background per plant, we created a set of backgrounds per plant which will be chosen randomly each time.
Thus, we created one folder per plant (14 plants in our Dataset) and in each Folder:
- 5 images of the actual plant (ie: potato plants for all potato classes).
- 4 images of plants, lawn and forest will be shared in every folder.
This should help the model to:
- Not focus on the background image.
- Have better performance in a real environment.
This model (2-bis) gives accuracy on the validation dataset of 98.4%.
However, it gives very bad predictions on the test dataset with an accuracy of 18.1%.
Summary of the iteration 2-Bis:
- Dataset: "Plant Village" with data augmentation and adding new classes from dataset "Image-Net".
- Adding backgrounds during the model training (randomly).
- Good **** predictions on images related to the Dataset.
- Slight improvement of the predictions on the test dataset compared to iteration 2 As backgrounds are random, the model is not focusing on it.
- The model seems to be able to predict if the image is not a plant.
We can see some improvement in the prediction performance on our test dataset, though the result is still below expectation. However, this model can predict whether an image is a plant or not, which is an enhancement compared to our first iteration.

Deep Learning – Iteration 3
During Iteration 2, we include a new Class called "Others" and representing images of "anything". This initiative was to help the model to avoid predicting a plant when there is none in the image. But we can see that the results aren’t as good as expected on the test dataset. One solution would be to predict the probability of a plant in an image. One output represents whether the image is containing a plant, the second Output represents the Plant Classification.

Hereafter, the code of the model – we used the Functional Model:
To evaluate how well our model predicts a dataset, we need to create a custom Loss Function:

with _Pother={0,1}. Whether the image is a plant (=1) or not (=0). The Categorical Crossentropy is not calculated if the image is not a plant.
As per our model, the first neuron represents whether the image contains a plant or not, the 38 other neurons represent the plant classification.
This model (3) gives accuracy on the validation dataset of 97.4%.
However, it gives very bad predictions on the test dataset with an accuracy of 19.2%.
But predictions of images without plants (Class: Others) on the test Dataset are better than the ones done with the iteration 2-bis models.
Summary of iteration 3:
- Dataset: "Plant Village" with data augmentation and adding new classes from dataset "Image-Net".
- Excellent prediction on the Validation Dataset – val accuracy > 99%.
- Predictions on the Test Dataset are still bad.
- Better predictions of images without a plant (Class: Others) on the test Dataset.

Improvement
- New Dataset with Images in a real environment: As we found out during a couple of testing phases, the model is doing great on images with only one leaf and uniform background but can’t predict images in natural places. We tried to fix this issue by adding a background, but we can still see some flaws in the outcome. Creating a full dataset from images taken directly in a real environment (or a combination of both) might help to improve the reliability of the results. This can be done using Web Scraping.
- Concatenation of 2 models (multi-output): In the current Dataset, some plants have the same disease (ie: Bacterial Spot disease for tomato and peach leaves, or Late Blight disease for potato and tomato leaves…). In a natural environment, our model seems to spot the correct disease but not the correct plant. One solution would be to create a multi-output model:
- One model focusing on the leave classification
- One model focusing on the Disease classification
With this method, we can predict in a separate way the plant and the disease. So this can improve the model and might be able to predict known disease on the new plant – ie: Bacterial Spot (known disease) on Apple.

- Semantic Segmentation: To improve the prediction in a natural environment, the use of a semantic segmentation model like UNET, would help to extract the pixels of the leaves, and then we would use the Leaves Classification model to predict the plant itself.
- Deeper model: As leaf images in a real environment are more complex, a deeper model with more convolutional blocks, in order to capture more patterns, should give better predictions.
In Summary

Models developed through those three iterations are making excellent predictions on the Validation dataset, with a result of up to 97%. It was a natural choice to test those models on images in a natural environment, to represent a real business case.
But we found quickly that even though we tried as much as we could to improve our model to detect images in a real environment, we couldn’t achieve this goal.
We can tweak the model as far as possible, it can only predict what it was trained with – images of leaves on a uniform background.
We were able to validate some hypothesis – detection of images without plants, classification of images with uniform backgrounds… – and reject the fact that adding backgrounds would help improve the model robustness.
There is still room for improvement of this model as proposed previously but before anything else, we should consider using a Dataset that is closer to the problematic – predict plant disease from any photography.
Written by Clementine Pages, Jean Joanbon and selva cleber. Thanks to DataScientest team for their help.
_Iterations code can be found on Github._
Acknowledgements
Samir Bhattara (2018, Nov.), New Plant Diseases Dataset v2
Orignial Dataset: spMohanty (2018, Sept.), PlantVillage-Dataset
Abdallah Ali (2019, Sep.), Plant Village dataset v3
ImageNet (2016), Stanford Vision Lab, Stanford University, Princeton University