Transfer Learning with Fruit Classification

A brief overview of Transfer Learning and an example of how to utilize pre-trained models for your deep learning with InceptionV3.

Published in

Towards Data Science

13 min readMay 8, 2020

“An image of a hypothetical Artificial Intelligence Brain” by Jack Moreh, from Freerange

For those of you who don’t know what deep learning is, its a genre within machine learning that falls under the category of Artificial Intelligence. This type of technology isn’t I,Robot artificial intelligence and we as a species are pretty far away from actually developing something like that. However, deep learning and many other types of machine learning approaches are as close as it gets when we talk about programmable computers acting like humans.

Deep Learning

So briefly, deep learning is a machine learning approach that goes beyond smaller methods of learning that may only require one or two layers of learning which could be known as shallow learning. But, deep learning is layers upon layers upon layers of learning. The models that represent these layers are referred to as neural networks in which the name stemmed from the study of neurobiology but don’t get confused because it isn’t actually a network made to emulate how a brain functions. Neural networks come in many different forms, but we will be focusing on only one for our transfer learning, which is a convolutional neural network.

So what is a convolutional neural network?

A convolutional neural network is just another method for training the network (model) in order to give accurate classification. Out of all the other neural networks out there, convolutional neural networks perform great for computer vision learning. What makes CNNs or (covnets)amazing is the patterns they learn from images are translation invariant, meaning if they pick up on a pattern in the corner of an image they will recognize that same pattern in any different image’s corner, whereas a regular network would have to re-learn it over and over again. Covnets also are able to learn spatial hierarchies of patterns meaning each layer of a Covnet will learn something different. The first layer may learn small patterns and the next layer may learn bigger patterns that are features from the first layer. How these features are obtained is through the convolution function.

A Convolution

The convolution function is used to obtain feature maps (feature matrices) for a convolutional layer. At the baseline, covnets have configured weights that consist of a kernel. A kernel is used to get distinct features from an input image (input layer) like it can be used to gather the input image’s sharpness, edges, or gather information on how to detect an edge. This function can be represented as n*n which is a matrix containing lots of unique values. The kernel convolutes (slides and multiplies) on top of an input image and let's say the input image is (10,10) and the kernel is (3,3). The first slide (stride) will multiply by 9 pixels in the very top left corner of the input image to produce an output of a single-pixel in the top left corner of a new matrix called a feature map.

UPDATE: The second multiplication should be 4 * 2 = 8 replacing the product of 1 with 8 for all the products being summed up.

by Krut Patel from Towards Data Science, 2019; annotated by James Nelson, 2020.

This multiplication continues as the kernel slides across the input image like so.

This continual process doesn’t stop until the entire feature matrix has been filled with these convoluted values and once the feature matrix is complete it is stacked inside of a convolutional layer. If the network is designed to do so another kernel will produce another feature matrix with the same input image to store the next feature matrix within the same convolutional layer.

“What happens when feature matrices are made”

To keep it short, the training of a convolutional neural network is about locating all the right values with each of the kernels so that when the input image is passed through the layers it will activated different neurons on the last output layer to predict and accurately classify the image.

What Transfer Learning is and how can it Help?

Transfer learning makes life easier and better for everyone. Although creating convolutional neural networks from scratch is fun, they can be a bit pricey and cost a lot of computational power as well. So in order to reduce the amount of power needed for the network we use transfer learning which are pre-trained weights that already undergone training on another image in order to increase performance for our network. What makes using pre-trained models an optimal choice is the fact they have already been configured and trained on millions of other images that consist of thousands of classes for many days at a time to provide the highly capable pre-trained weights we need in order to train a network of our own with ease (Aditya Ananthram, 2018).

The Practical Application

Now in order to demonstrate the practical application of transfer learning’s capabilities, I will cover the data that was used, the pre-trained model of choice, the model architecture, and then the code.

“A farmer’s market stand-in Italy” by Merelize from Freerange

The Description of the Data

The dataset contains 81,104 images of different fruits and vegetables, consisting of 120 unique classifications for each image of fruits and vegetables. The total number of images is split into training and testing datasets. The training dataset contains 60,486 images and the testing dataset is 20,618 images.

The size of all images is 100x100 pixels and was collected with a logitech C920 camera that was used to film the fruits/vegetables (Mihai Oltean, 2019). All fruits and vegetables were planted inside of a shaft with a low-speed motor where they were recorded in a short duration of 20 seconds each. The fruits and vegetables testing images were taken with a Nexus 5X smartphone.

The Model

“Image of a ‘Totem’ spinning. Are you still dreaming?” Photo by Ash from Modern Afflatus on Unsplash

The transfer learning model of choice is called InceptionV3. The model is a convolutional neural network architecturally designed to be 48 layers deep trained on image shapes of 299 by 299. The original Inception architecture network was called “GoogLeNet” which was a 27 layer deep convolutional neural network made back in 2014(Shaikh, 2018). The name of the model derives from the movie “Inception” directed by Christopher Nolan, based on the concept of going deeper into a dream “A dream within a dream,” translating to a convolutional neural network within a convolutional neural network.

The idea behind GoogLeNet’s design was to eliminate the issues found commonly found with overfitting when working on deeper neural networks. Overfitting usually occurs when the dataset is too small and is being trained within a large neural network and the problem overfitting presents is a misrepresentation of the validation accuracy (testing accuracy) of the model. Testing accuracy is the measure of how precise the trained network accurately predicts the images it hasn’t seen. The solution to designing an enormous network to produce this accuracy is to create a sparsely connected neural network in place of a fully connected neural network (Shaikh, 2018), and that is the reason why the GoogLeNet model won the ImageNet Visual Recognition Challenge with a predictive accuracy of 80%+ back in 2014.

Model Architecture

The InceptionV3 model is connected to two fully connected layers at the bottom but has its dimensionality reduced from 3D to a 1D with Global Average Pooling 2D before this connection. The pooling will also output one response for every feature matrix. After the pooling, the next layer of the architecture is the first dense hidden layer with 512 units (neurons) which will be connected to the final output layer with 10 neurons to match the number of fruits and vegetable classes. This is what the InceptionV3 architecture looks like.

“An architectural design of the pre-trained InceptionV3 model” by Milton-Barker Adam from Intel

And this is what the bottom fully connected layers to attached to the architecture looks like.

“The bottom architectural layout of the InceptionV3 model attached to the fully connected layers”

It is also worth mentioning that fine-tuning these pre-trained models and the weights associated with them. I can choose which weights I want to use from the model and it can be either the top half, bottom half, the middle, or I can freeze all the weights. Do this means that whichever portion of the pre-trained model I freeze won’t be trainable weights that can be updated for the model I am making. I also can choose the weights of which image the model was trained on, but for this example though and through trial and error, I chose not to freeze the weights. My implementation of InceptionV3 will be using the pre-trained weights on ImageNet. “ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently we have an average of over five hundred images per node.” (“ImageNet,” 2017)

Finally the Code

Now that you have an idea of what the dataset looks like and an idea of the model architecture, it’s time to execute.

Preparing the Data and Training the Network

Loading the Libraries.

First things first; we have to load the necessary libraries. When loading the libraries, ensure that all necessary modules needed are imported so we can prepare the data and train the model.

# read in libraries
import tensorflow as tf
from tensorflow.keras import backend, models, layers, optimizers
import numpy as np
from tensorflow.keras.layers import GlobalAveragePooling2D
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import plot_model
from IPython.display import display
from PIL import Image
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import os, shutil
from tensorflow.keras.models import Model
np.random.seed(42)

Load the Data and Prepare it.

Next, in order to prepare the data, we need to set up a train_datagen and test_datagen with ImageDataGenerator. Then with those generators resize the images of the training data and testing data to match the pre-trained model’s pixel image inputs. To ensure that the neural network doesn’t learn irrelevant patterns and in return boosts overall performance.

# Specify the base directory where images are located.
base_dir = '/kaggle/input/fruits/fruits-360/'
# Specify the traning, validation, and test dirrectories.  
train_dir = os.path.join(base_dir, 'Training')
test_dir = os.path.join(base_dir, 'Test')# Normalize the pixels in the train data images, resize and augment the data.
train_datagen = ImageDataGenerator(
    rescale=1./255,# The image augmentaion function in Keras
    shear_range=0.2,
    zoom_range=0.2, # Zoom in on image by 20%
    horizontal_flip=True) # Flip image horizontally # Normalize the test data imagees, resize them but don't augment them
test_datagen = ImageDataGenerator(rescale=1./255) 
train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(299, 299),
    batch_size=16,
    class_mode='categorical')test_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=(299, 299),
    batch_size=16,
    class_mode='categorical')

Prepare the InceptionV3 Model

Now that the images are prepared it is time we import and set up the pre-trained InceptionV3 model for transfer learning.

# Load InceptionV3 library
from tensorflow.keras.applications.inception_v3 import InceptionV3# Always clear the backend before training a model
backend.clear_session()# InceptionV3 model and use the weights from imagenet
conv_base = InceptionV3(weights = 'imagenet', #Useing the inception_v3 CNN that was trained on ImageNet data.  
                  include_top = False)

Create a Functional API Model.

Now let's combine the pre-trained InceptionV3 model weights with the dense layers (fully connected layers) and reduce the dimensionality of the model in between the two.

# Connect the InceptionV3 output to the fully connected layers
InceptionV3_model = conv_base.output
pool = GlobalAveragePooling2D()(InceptionV3_model)
dense_1 = layers.Dense(512, activation = 'relu')(pool)
output = layers.Dense(120, activation = 'softmax')(dense_1)

Display the Functional API Model.

To get an understanding of the model architecture, we can display the functional API model as a whole to visually see the depth of the network.

# Create an example of the Archictecture to plot on a graph
model_example = models.Model(inputs=conv_base.input, outputs=output)
# plot graph
plot_model(model_example)

(The model is way too big to display here on Medium, click this link to see)

Define the Model and Compile it.

In order for us to train the model, we need to define the functional API model and compile the model with categorical cross-entropy as the loss function and Stochastic Gradient Descent with a learning rate and momentum parameters.

# Define/Create the model for training
model_InceptionV3 = models.Model(inputs=conv_base.input, outputs=output)# Compile the model with categorical crossentropy for the loss function and SGD for the optimizer with the learning
# rate at 1e-4 and momentum at 0.9
model_InceptionV3.compile(loss='categorical_crossentropy',
              optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
              metrics=['accuracy'])

Check the Device List for the GPU to Use.

Now I recommend using a GPU to train this model since the InceptionV3 model has over 21 million parameters and training on a CPU could take days to complete. If you have a GPU you can use your own but I used Kaggle’s GPUs supplied to their notebooks which took me about 20–25 min for the training to complete. Find the GPU device usable so the training process can be sped up.

# Import from tensorflow the module to read the GPU device and then printfrom tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

Train the Model.

After finding the GPU to use we will incorporate it into our code to finally train the model with the train_generator for the training data and the validation_data parameter set to the test_generator.

# Execute the model with fit_generator within the while loop utilizing the discovered GPU
import tensorflow as tf
with tf.device("/device:GPU:0"):
    history = model_InceptionV3.fit_generator(
        train_generator,
        epochs=5,
        validation_data=test_generator,
        verbose = 1,
        callbacks=[EarlyStopping(monitor='val_accuracy', patience = 5, restore_best_weights = True)])

99% Validation accuracy with a loss of 0.0187 is more than good.

Display the Model’s Testing Accuracy and Testing Loss Value

Now let’s see how our model looks by plotting the training accuracy/validation accuracy and training loss/validation loss across the epochs, then print the final test accuracy and test loss.

# Create a dictionary of the model history 
import matplotlib.pyplot as plt
history_dict = history.history
loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']
acc_values = history_dict['accuracy']
val_acc_values = history_dict['val_accuracy']
epochs = range(1, len(history_dict['accuracy']) + 1)# Plot the training/validation loss
plt.plot(epochs, loss_values, 'bo', label = 'Training loss')
plt.plot(epochs, val_loss_values, 'b', label = 'Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()# Plot the training/validation accuracy
plt.plot(epochs, acc_values, 'bo', label = 'Training accuracy')
plt.plot(epochs, val_acc_values, 'b', label = 'Validation accuracy')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
# Evaluate the test accuracy and test loss of the model
test_loss, test_acc = model_InceptionV3.evaluate_generator(test_generator)print('Model testing accuracy/testing loss:', test_acc, " ", test_loss)

Analysis of the Results

The results for accurately predicting 120 classes of fruits and vegetable images comes out to being 99% testing accuracy with a 1.8% loss value. Loss value is the measure of distance for how far off our outputs are and what we expected. The training was done through 5 epochs and took around 20–25min each to complete to achieve this accuracy with the help of kaggle’s GPU to speed up the process. The training data was a 3781 step process (iterations), taking a batch size of data by every 16 samples to be propagated forward and backward to give us one pass. One pass equals one iteration.

Conclusion

In conclusion, transfer learning is a very effective way to train datasets to recognize and classify images. It allows for fast setup without going in detail designing a convolutional neural network architecture from scratch and it provides high accuracy with the pre-trained model’s previous training. You can head over to my project’s Kaggle notebook for anything further.

References

Aditya Ananthram. (2018, October 17). Deep Learning For Beginners Using Transfer Learning In Keras. Retrieved April 24, 2020, from Medium website: https://towardsdatascience.com/keras-transfer-learning-for-beginners-6c9b8b7143e

Shaikh, F. (2018, October 18). Understanding Inception Network from Scratch (with Python codes). Retrieved May 7, 2020, from Analytics Vidhya website: https://www.analyticsvidhya.com/blog/2018/10/understanding-inception-network-from-scratch/

Chollet, F. (2018). Deep Learning with Python. Shelter Island (New York, Estados Unidos): Manning, Cop.

File:Valid-padding-convolution.gif — Wikimedia Commons. (2018, July 6). Retrieved May 6, 2020, from Wikimedia.org website: https://commons.wikimedia.org/wiki/File:Valid-padding-convolution.gif

Get Free Stock Photos of Concept of Intelligence with Human Brain on Blue Background Online | Download Latest Free Images and Free Illustrations. (2020). Retrieved May 9, 2020, from Freerangestock.com website: https://freerangestock.com/photos/65677/concept-of-intelligence-with-human-brain-on-blue-background.html

Get Free Stock Photos of fruit and vegetables vendor italy Online | Download Latest Free Images and Free Illustrations. (2020). Retrieved May 9, 2020, from Freerangestock.com website: https://freerangestock.com/photos/37652/fruit-and-vegetables-vendor-italy.html

https://github.com/syt123450, syt123450. (2020). Layer — GlobalPooling2d. Retrieved May 6, 2020, from Tensorspace.org website: https://tensorspace.org/html/docs/layerGlobalPooling2d.html

ImageNet. (2017). Retrieved May 7, 2020, from Image-net.org website: http://www.image-net.org/

Krut Patel. (2019, September 8). Convolutional Neural Networks — A Beginner’s Guide — Towards Data Science. Retrieved April 24, 2020, from Medium website: https://towardsdatascience.com/convolution-neural-networks-a-beginners-guide-implementing-a-mnist-hand-written-digit-8aa60330d022

Mihai Oltean. (2020). Fruits 360. Retrieved May 6, 2020, from Kaggle.com website: https://www.kaggle.com/moltean/fruits

Milton-Barker, A. (2019, February 17). Inception V3 Deep Convolutional Architecture For Classifying Acute Myeloid/Lymphoblastic Leukemia. Retrieved May 6, 2020, from Intel.com website: https://software.intel.com/en-us/articles/inception-v3-deep-convolutional-architecture-for-classifying-acute-myeloidlymphoblastic

Prakhar Ganesh. (2019, October 18). Types of Convolution Kernels : Simplified — Towards Data Science. Retrieved April 24, 2020, from Medium website: https://towardsdatascience.com/types-of-convolution-kernels-simplified-f040cb307c37

Unsplash. (2020). Ash from Modern Afflatus. Retrieved May 9, 2020, from Unsplash.com website: https://unsplash.com/@modernafflatusphotography

VisibleBreadcrumbs. (2016). Retrieved April 27, 2020, from Mathworks.com website: https://www.mathworks.com/help/deeplearning/ref/inceptionv3.html

Wikipedia Contributors. (2020, May 2). I, Robot (film). Retrieved May 5, 2020, from Wikipedia website: https://en.wikipedia.org/wiki/I,_Robot_(film)