DATA SCIENCE FOR BEGINNERS

End to End Deep Learning Project: Part 1

Implementing an EfficientNet image classification model for transfer learning with Keras

Dr. Varshita Sher
Towards Data Science
19 min readNov 18, 2021

--

Note: This is the first part of a two-part series implementing a deep-learning project from scratch. Part 1 covers the setup for the problem statement, data preprocessing, the intuition behind transfer learning, feature extraction, fine-tuning, and model evaluation. Part 2 covers the implementation of the Flask app and its subsequent deployment on Heroku. Please follow the tutorials in order for maintaining continuity. Code on Github.

You can play around with the flask app here.

Introduction

I have been fortunate enough to work in environments where (a) infrastructure and architecture for data generation was readily available, (b) data wrangling was handled by analysts, and (c) MLOps was handled by a separate division of data engineers. These perks have given me the freedom to focus on the thing I love the most — data modeling. Having said that, I always wanted to learn a few basics at the very least, if I ever had to do an entire project on my own. This is precisely the motivation behind this article.

I decided to implement an end-to-end DL project since there are a few challenges pertaining mainly to their deployments — due to the size of the models we must deal with — and fine-tuning of the model to suit our particular use case.

The project will consist of three parts:

  • Part 1: Setup (virtual environment, training dataset, etc.), Model Training (fine-tuning with Keras, learning curve monitoring, etc.), Testing.
  • Part 2: Building a Flask app and deployment on Heroku.

The aim of the two-part series is to provide you with source code, tips, tricks, and familiarity with common runtime errors when working with deep learning models. I am sure these will come in handy while explaining projects during data science interviews.

Headsup: Some of the stuff in this (and subsequent) article will be discussed in excruciating detail as the aim is for people (especially early-stage researchers) to understand the reasons/pros/cons behind some design decisions and answer them flawlessly if probed during interviews.

Part 1: Setup

Virtual Environment

Using the terminal, create a virtual environment called e2eproject inside the project directory and activate it.

python3 -m venv e2eproject
source e2eproject/bin/activate

Dataset

We will be working with the publically available House Room Dataset from Kaggle.

You can download it manually and later move it into your project directory OR use the following command in the terminal to download it directly into your project directory.
P.S.: Make sure you are inside the project directory before running the following command.

kaggle datasets download -d robinreni/house-rooms-image-dataset — unzip

Task

We will be working on an image classification task. In particular, we will be developing a model that can detect whether a house interior is modern (class M) or old (class O) given an image of its bedroom. Such a model may find utility for property valuations during remortgaging or at the time of selling a property.

As you may have already noticed, the dataset is unlabelled, however, one of my friends generously offered to hand label ~450 images. (The labels have been provided in the Github repo.) Although this is not a substantial dataset size, we were still able to achieve almost 80% accuracy on a held-out test set. Additionally, appropriate techniques for fine-tuning, improving model metrics, etc. will be discussed to ascertain whether it is worth spending more time labeling additional data points.

Part 2: Model Training

Let’s create the model.ipynb notebook.

Installations

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
from imutils import paths
from tqdm import tqdm
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import seaborn as sns
import numpy as np
import shutil
import os

Note: You might have to do a few pip install XXXto get the above cell working.

Helper Variables & Functions

ORIG_INPUT_DATASET = "House_Room_Dataset/Bedroom"TRAIN = "training"
VAL = evaluation"
TEST = "testing"
BASE_PATH = "dataset"
BATCH_SIZE = 32
CLASSES = ["Modern", "Old"]

We will only be working with the bedroom images, hence ORIG_INPUT_DATASET points to the bedroom sub-directory. BASE_PATH is the path to the directory where we will be storing the train, test, and validation splits for the images. This will be empty initially.

def plot_hist(hist, metric):
if metric == 'auc':
plt.plot(hist.history["auc"])
plt.plot(hist.history["val_auc"])
else:
plt.plot(hist.history["loss"])
plt.plot(hist.history["val_loss"])
plt.style.use("ggplot")
plt.title("model {}".format(metric))
plt.ylabel("{}".format(metric))
plt.xlabel("epoch")
plt.legend(["train", "validation"], loc="upper left")
plt.show()

This is some boiler-plate code for plotting two types of learning curves — AUC vs. epoch and loss vs. epoch.

Note: If you are working with a metric other than auc, say accuracy, make sure to update aucwith accuracy and val_auc with val_accuracy in the code snippet above.

Loading labels

(labels.txt has been made available as part of the repo.)

# Reading labels from the txt file
with open("labels.txt", 'r') as f:
manual_labels = f.read()
# Extracting individual labels into a list
labels = [i for i in manual_labels]
len(labels)
********* OUTPUT **********
451

To check whether the dataset is balanced:

from collections import Counterprint(Counter(labels).keys()) 
print(Counter(labels).values())
********* OUTPUT **********
dict_keys(['O', 'M'])
dict_values([271, 180])

Looks like we have more old houses compared to modern ones in our dataset (although not by a very large margin). Hence, it makes sense to ditch accuracy and pick a metric that is more suitable to deal with class imbalance, namely AUC (Area Under ROC curve).

Train Test Validation Splits

Before we do the splitting, it’s important to sort the filenames because we have the labels for the first 451 images (in House_Room_Dataset/Bedroom subdirectory) and not just any random 451 images. By default, os.listdir() returns the files in some random order and we shouldn't rely on it.

# sorting files in the order they appear
files = os.listdir(ORIG_INPUT_DATASET)
files.sort(key=lambda f: int(f.split('_')[1].split('.')[0]))
# checking to see the correct file order
files[:5]
********* OUTPUT **********
['bed_1.jpg', 'bed_2.jpg', 'bed_3.jpg', 'bed_4.jpg', 'bed_8.jpg']

Now that we know we have the correct 451 images, let’s proceed to the train-test-validation splits. We will allocate ~ 75%, 15%, and 10% of the data for training, validation, and testing, respectively.

# splitting files into train and test sets
trainX, testX, trainY, testY = train_test_split(files[:len(labels)],
labels,
stratify=labels,
train_size=0.90)
# further splitting of train set into train and val sets
trainX, valX, trainY, valY = train_test_split(trainX, trainY, stratify=trainY, train_size=0.85)
# Checking the size of train, test, eval
len(trainX), len(trainY), len(valX), len(valY), len(testX), len(testY)
********* OUTPUT **********
(344, 344, 61, 61, 46, 46)

Using Sklearn’s train_test_split() method, we first split the entire dataset into train and test sets, followed by a second split of the train data into train and validation sets. It is important to stratify by labels because we want a proportional distribution of both modern and old images in all three sets — train, test, and validation.

Building the training dataset directories

Later on in the code, you’ll notice that during training we won’t be loading the entire dataset into memory. Instead, we will make use of Keras’s .flow_from_directory() function to allow for batch processing. However, this function expects the data to be organized into directories as follows:

Fig 1: Directory structure for reading images in batches in Keras. Class M and O refer to Modern and Old, respectively.

To get our image files organized in the above format, we will make use of this short snippet:

While the code snippet runs, you should be able to see the progress using the tqdm module and once it finishes, you’ll find three new sub-directories created — dataset/training, dataset/evaluation, and dataset/validation and within each of these, there will be two sub-sub-directories, one each for modern and old houses.

As a sanity check, let’s see we have the expected number of images in each subdirectory.

trainPath = os.path.join(BASE_PATH, TRAIN)
valPath = os.path.join(BASE_PATH, VAL)
testPath = os.path.join(BASE_PATH, TEST)
totalTrain = len(list(paths.list_images(trainPath)))
totalVal = len(list(paths.list_images(valPath)))
totalTest = len(list(paths.list_images(testPath)))
print(totalTrain, totalTest, totalVal)********** OUTPUT *******
344 46 61

Note: If your custom data is in the structure described below, there is a useful python package called split_folders that can be used to get the data in the directory structure defined in Fig 1.

dataset/
class1/
img1.jpg
img2.jpg
...
class2/
img3.jpg
...
...

Image Preprocessing

Because we are dealing with rather limited sample size, it is often recommended to randomly augment images using rotations, zooming, translations, etc.

While it might be tempting to think that data augmentations increases the amount of training data available, what it actually does is take a training sample and apply a random transformation to it [Source]. Overall, the sample size remains the same.

Keras allows random augmentations for brightness, rotation, zoom, shear, etc. using the ImageDataGenerator and the best part is that all this is done on the fly during model fit i.e. you need not compute them in advance.

Training data augmentation:

trainAug = ImageDataGenerator(
rotation_range=90,
zoom_range=[0.5, 1.0],
width_shift_range=0.3,
height_shift_range=0.25,
shear_range=0.15,
horizontal_flip=True,
fill_mode="nearest",
brightness_range=[0.2, 1.0]
)

Most parameters such as width_shift, height_shift, zoom_range and rotation_range should be intuitive (if not, have a look at the official Keras documentation).

An important thing to note is that when you perform, say zooming or rotation, some empty areas/pixels might be created in the image which must be filled using the appropriate technique mentioned in fill_mode.

Validation data augmentation:

valAug = ImageDataGenerator()

You’ll observe that we have provided no arguments when initializing the data augmentation object for the validation data. This means we are going to use the default value for all of them, which is 0. In other words, we are not applying any augmentations (no zooming, width shifts, horizontal flips, etc.) to the validation set because this set should be treated as a test set when evaluating the model during training.

Testing data augmentation:

testAug = ImageDataGenerator()

Following the same logic as above, we are not applying any augmentations to the test set.

Creating data generators

As mentioned earlier, we need to create some data generators which will keep feeding these augmented images into batches to the model during training. To do so, we can use the flow_from_directory() generator function.

# Create training batches whilst creating augmented images on the flytrainGen = trainAug.flow_from_directory(
directory=trainPath,
target_size=(224,224),
save_to_dir='dataset/augmented/train',
save_prefix='train',
shuffle=True
)
# Create val batches valGen = valAug.flow_from_directory(
directory=valPath,
target_size=(224,224),
shuffle=True
)

Few important things to consider:

  • In each case, the directory is set to the path where the training (or validation) images reside.
  • Specifying the target_size as (224,224) ensures all images will be resized to this size.
  • We are also going to set save_to_dir as the path to the directory where we are going to save the augmented images (with the prefix specified in save_prefix) that will be created on the fly during training. This provides a good sanity check to see if the images are getting randomly transformed as they should. Note: If you’d like to check this beforehand, i.e. before training begins, here’s a quick snippet I found on StackOverflow.
  • Finally, shuffle is set to True because we want the samples to be shuffled within the batch generator so that when a batch is requested by model.fit(), random samples are given. Doing so will ensure batches between epochs don’t look alike and will eventually make the model more robust.
# Create test batchestestGen = testAug.flow_from_directory(
directory=testPath,
target_size=(224,224),
shuffle=False
)

Other than setting the correct directory path for testGen, there is one main thing to consider:

  • shuffle must be set to False.

Why, you ask?
Because, now we don’t want the samples to be shuffled within the test batch generator. Only when shuffle is set to False, will the batches be created in the same order as the filenames provided. This is needed to match the filename (i.e. true labels, accessible using testGen.classes) with the predicted labels during model evaluation.

Fun fact: If you check the output of trainGen.classes right now hoping that they would be shuffled, you would be disappointed. Why? Because the shuffling happens on-the-fly when a batch is requested during the time of model fitting. [StackOverflow].

Intuition behind training process

We could have trained a model from scratch but that is bound to underperform — mainly because we have such a small dataset. In such scenarios, it makes sense to harness the power of transfer learning.

Transfer learning refers to the process of fine-tuning a pretrained model on a new dataset. This enables it to recognize classes it was never trained on!

In a nutshell, transfer learning allows us to leverage the knowledge a model gained during training to recognize dogs from cats, such that it can now be used to predict whether a house interior is modern or not.

But why does it work?
Because any base model we pick (i.e. the pretrained model) is usually trained on such a large corpus of images, it is capable of learning good vector representations of images, in general. All that is left to do is use these representations when distinguishing between custom classes (in our case, old vs modern houses).

Acknowledgment: I would like to take a moment and give a big shoutout to a few blogs (this, this, and this) that I found during the research phase for this article. These turned out to be true gems and helped me understand the concept of transfer learning in detail. I truly appreciate all your insights which have allowed me to simplify the code/explanations for my readers.

Transfer Learning using Keras

There are two main steps involved in transfer learning:

  • Feature Extraction: Take a pre-trained model (and freeze its weights) as the base model and then train a new classifier* on top such that it outputs exactly N values (where N is the number of classes).
  • [optional] Fine Tuning: Once the classifier is trained, unfreeze a few** layers from the base model so that it adapts well to the new dataset.

*The new classifier can be:

  • a stack of Dense layers (i.e. fully connected layers).

OR

  • a single global pooling layer (downsize the entire feature map to a single value — maxpool, avgpool). This is preferred because there is less overfitting as there are 0 parameters to optimize (and hence our choice for this article).

** A few can vary depending on how different your dataset is from the one the pre-trained model was initially trained on. Keep in mind that if the two datasets are quite similar, then it may be beneficial to only unfreeze a fraction of all layers.

The fine-tuning step, although optional, is quite crucial for use cases where your custom dataset is quite different from the dataset on which the base model was trained. Also, this may require more epochs compared to the feature extraction step. Because more epochs roughly translate to higher chances of overfitting, it is recommended to use early stopping (of model training) after careful monitoring of the loss/accuracy curves.

Intuition behind model selection

Coming to the million-dollar question — which model should we select as the base model for fine-tuning? Clearly, there are quite a few options available, as can be found on the Keras documentation here. While my initial choice was ResNet-50 due to its popularity, I finally decided to proceed with EfficientNet due to the fact that they can achieve similar results as SOTA models while requiring fewer FLOPS. Also, the paper mentions that their performance is at par with SOTA models on transfer learning tasks whilst requiring 9.6x fewer parameters on average. Wohoo ⭐️

There are quite a few flavors of the EfficientNet models (EfficientNetB0, EfficientNetB1, …… EfficientB7) and they differ slightly in architecture (i.e. network depth, width) and resource limitations. Each of these models expects images in a particular image shape as described in this table. Given we are working with 224x224 resolution images, we will go with EfficientNetB0.

Model training for Feature Extraction step

Note: We will be using Tensorflow’s Keras API for this tutorial. If you are new to Keras, I have already written two beginner-level Keras tutorials (Part1, Part2) that cover network architecture, neurons, activation functions, hidden layers (Dense, Dropout, MaxPool, Flatten), etc in much more detail than would be discussed here. Feel free to refer them for a quick refresher!

We begin with creating an EfficientNetB0 base model using imagenet weights.

baseModel = EfficientNetB0(
weights="imagenet",
include_top=False, # make sure top layer is not included
input_tensor=Input(shape=(224, 224, 3)),
pooling="avg"
)

Few things to consider:

  • include_top must be set to False because the top layer (i.e. the final layer) in the EfficientNet network architecture is a Dense layer that outputs 1000 classes corresponding to the ImageNet dataset. We clearly don’t need this!
  • if you remember correctly, we opted for the new classifier to be a global pooling layer (instead of a stack of dense layers). Well, the good news is that Keras API already allows us to do this whilst instantiating theEfficientNetB0 object. We can simply set thepooling parameter as avg. The default is None.

The next step is to freeze the weights by setting trainable for each layer as False:

# freeze the weightsfor layer in baseModel.layers:
layer.trainable = False

Now it’s time to create a new classifier on top which will spit out exactly two classes (M or O). To do so, we need to make sure the final layer of this classifier model is a Dense layer with two output neurons. In between, we have also included BatchNormalization and Dropout layers for regularization.

# training a new classifier on top (Functional Keras Model)x = baseModel.output
Layer_1 = BatchNormalization()(x)
Layer_2 = Dropout(0.5)(Layer_1)
output_layer = Dense(len(CLASSES), activation="softmax")(Layer_2)
model = Model(inputs = baseModel.input, outputs = output_layer)

Note: There are two ways to build this Keras classifier model: sequential (most basic one) and functional (for complex networks with multiple inputs/outputs). The code snippet above is written as a functional network because it lends more clarity to the network architecture if you were to check it using model.summary(). Likewise, we could have created a Sequential model like the one below and the results would be the same.

# Another way to create the classifier on top of basemodelmodel = tf.keras.Sequential()
model.add(baseModel)
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(len(CLASSES), activation="softmax"))

Finally, let's compile the model with Adam optimizer and a relatively large learning_rate = 1e-3. Since we have two possible output classes, we will be monitoring the binary_crossentropy loss (use categorical_crossentropy if you're dealing with more than two classes) and assessing the model usefulness based on the AUC metric implemented in tf.keras.metrics.AUC.

# compileopt = Adam(learning_rate=1e-3)
model.compile(optimizer=opt,
loss='binary_crossentropy',
metrics=[tf.keras.metrics.AUC()]
)

One last thing to do before training the model using fit() is implementing EarlyStopping and ModelCheckpoint.

The former will ensure that the model does not train for more epochs than necessary. This is done by monitoring val_loss and as soon as there are no further improvements i.e. it can’t be minimized further, training is stopped.

The latter will save the best model at the given file path — in our casefeature_extraction.h5. We are again going to monitor validation loss and save the best model from all epochs.

Note: Here’s an excellent article explaining both EarlyStopping and ModelCheckpoint implementation in more detail!

# implementing early stopping
es = EarlyStopping(
monitor='val_loss', #metric to monitor
mode='min', # whether to min or max the metric monitored
patience=10, # epochs to wait before declaring stopped training
verbose=1 # output epoch when training was stopped
)
# implementing model checkpoint
mc = ModelCheckpoint(
'feature_extraction.h5',
monitor='val_loss',
mode='min',
verbose=1, # display epoch+accuracy everytime model is saved
save_best_only=True
)

Finally, it’s time for model training:

# Training the modelhist = model.fit(
x=trainGen,
epochs=25,
verbose=2,
validation_data=valGen,
steps_per_epoch=totalTrain // BATCH_SIZE,
callbacks=[es, mc]
)
Loss and AUC scores at the final epoch

Taking a quick look at the AUC and loss curves, we can find evidence for model convergence (meaning the model is ready for the fine-tuning step).

Learning curves for AUC and loss

One of the interesting observations from the graph on the right was that our validation loss was lower than training loss. At first, I thought there was some data leakage issue but then I found this excellent article that explained why this is totally normal and can sometimes happen during training.

To summarize two possible reasons (from the article itself):

  • Reasons #1: Regularization (such as Dropouts) is applied only during training and not during validation. Since regularization sacrifices training accuracy to improve validation/test accuracy, validation loss can go lower than train loss.
  • Reason #2: Our validation set is too small (only 61 images) and perhaps it was too easier than the training set, i.e. unrepresentative validation dataset.

Model Testing after Feature Extraction Step

We are going to use some boilerplate code for evaluating the model predictions obtained using .predict(). Bear in mind that the predIdxs will be something like [0.8, 0.2] i.e. the softmax value for both class M andO, so make sure you pick the maximum of the two using np.argmax. We use testGen.class_indices to check the mapping from class names to class indices.

testGen.reset()predIdxs = model.predict(
x=testGen,
steps=(totalTest // BATCH_SIZE) + 1
)
predIdxs = np.argmax(predIdxs, axis = 1)
print("No. of test images", len(predIdxs))
print(testGen.class_indices)
cm = confusion_matrix(testGen.classes, predIdxs)
heatmap = sns.heatmap(cm, annot=True)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()
********* OUTPUT********
No. of test images 46
{'M': 0, 'O': 1}

Model Training for Fine-Tuning step

We will begin by unfreezing the last few layers of the current model, however, one shouldn’t go about randomly turning layers on or off. There are numerous techniques and tips available for fine-tuning models (see this and this as examples) but there were some that I found most useful:

  • When compiling the model at this step, use an even smaller learning rate compared to the feature extraction step. A smaller learning rate means more epochs will be needed since smaller changes will be made to the network weights on each update.
  • The BathcNormalization layers need to be kept frozen.
  • Within a network architecture, a convolution block needs to be turned on or off in its entirety.
    For instance: consider some of the last rows of the output from model.summary(). As you can see, the layers are neatly organized into blocks with block7das the final block. As a starting point, we will unfreeze all layers in block7d(any BathcNormalization layers, however, will be left as is) plus the 7 following layers (most of which were defined by us when we built a new classifier head). In total, the last 20 layers from the network would be candidates for unfreezing.
____________________________________________________________________
Layer (type) Output Shape Param # ====================================================================
.
.
.
block6d_project_conv (Conv2D) (None, 7, 7, 192) 221184 ____________________________________________________________________block6d_project_bn (BatchNormal (None, 7, 7, 192) 768 ____________________________________________________________________block6d_drop (Dropout) (None, 7, 7, 192) 0 ____________________________________________________________________block6d_add (Add) (None, 7, 7, 192) 0 ____________________________________________________________________block7a_expand_conv (Conv2D) (None, 7, 7, 1152) 221184 ____________________________________________________________________block7a_expand_bn (BatchNormali (None, 7, 7, 1152) 4608 ________________________________________________________________block7a_expand_activation (Acti (None, 7, 7, 1152) 0 ____________________________________________________________________block7a_dwconv (DepthwiseConv2D (None, 7, 7, 1152) 10368 ____________________________________________________________________ .
.
.

I have bundled the code for fine-tuning into a function called fine_tune_model(). Most of the code is repeated from the feature extraction step.

def fine_tune_model(model):     # unfreeze last conv block i.e. block7a 
for layer in model.layers[-20:]:
if not isinstance(layer, BatchNormalization):
layer.trainable = True
# check which of these are trainable and which aren't
for layer in model.layers:
print("{}: {}".format(layer, layer.trainable))
# compile (with an even smaller learning rate)
opt = Adam(learning_rate=1e-5)
model.compile(
optimizer=opt,
loss='binary_crossentropy',
metrics=[tf.keras.metrics.AUC()]
)
return modelmodel_fine_tuned = fine_tune_model(model)

Because fine-tuning will also make use of the same data generators i.e. trainGen,valGen, and testGen, it is important to reset them so they start with the very first sample in the dataset.

trainGen.reset()
valGen.reset()
testGen.reset()

Finally, let’s set the early stopping and model checkpoint (notice we have increased patience to 20 as we are now going to train for longer i.e. 50 epochs) and get the training started.

# implementing early stopping
es_tune = EarlyStopping(
monitor='val_loss',
mode='min',
patience=20,
verbose=1
)
# implementing model checkpoint
mc_tune = ModelCheckpoint(
'fine_tuned_house.h5',
monitor='val_loss',
mode='min',
verbose=1,
save_best_only=True
)
hist = model_fine_tuned.fit(
x=trainGen,
steps_per_epoch=totalTrain // BATCH_SIZE,
validation_data=valGen,
epochs=50,
verbose=2,
callbacks=[es_tune, mc_tune]
)

Model Testing after Feature Extraction Step

Upon comparing it with the previous confusion matrix, we have only managed to increase the number of correctly predicted images by 2 (see diagonal values in both heatmaps).

As a final sanity check, it’s also good to see whether this fine-tuning step shows any signs of overfitting.

The validation loss is not volatile and is stable at around 0.55, indicating the model has not been overfitted. In general, the AUC of the validation set predictions does get better with more epochs but with diminishing returns. (In simpler words, doesn't seem like training for longer would help our case substantially).

At first, I thought the fluctuations in the training curve were due to batch size since they play a role in how the network learns. Similarly, a too-large learning rate can deter convergence and cause the loss function to fluctuate and get stuck in local minima. However, neither increasing batch size nor decreasing learning rate helped in smoothing the gradient.

Another possible explanation that comes to mind is that the network has reached its capacity with respect to the given dataset i.e. it can learn no more from it. This is possible since we are trying to train a relatively large network (remember we have unfrozen some additional layers meaning more trainable parameters exist) using only 344 samples which are unable to provide sufficient information to learn the problem (any further).

Note: Before shoving more images into the training process in the hopes of improving the model, it might be worth tinkering around with model hyperparameters, train:val split, choice of pre-trained weights (weights from noisy student training are known to be better than those from ImageNet training), and the network architecture itself.

Future Work

It has been established in this recent paper and described in this video that joint training with both unlabelled and labeled datasets outperforms the pipeline wherein we first pre-train with unlabelled data and then fine-tune on labeled data. This is known as semi-supervised learning and will be the focus of our next tutorial. This would allow us to make full use of the remaining images in our dataset for which it was difficult to obtain labels.

Kudos for sticking around so long. 🥂

Head over to Part 2 for learning how to take this trained model and wrap it within a flask app. We are also going to write a quick and dirty front end and finally deploy the app on Heroku.

Until then :)

--

--

Senior Data Scientist | Explain like I am 5 | Oxford & SFU Alumni | https://podurama.com | Top writer on Medium