DATA SCIENCE FOR BEGINNERS
End to End Deep Learning Project: Part 1
Implementing an EfficientNet image classification model for transfer learning with Keras
Note: This is the first part of a two-part series implementing a deep-learning project from scratch. Part 1 covers the setup for the problem statement, data preprocessing, the intuition behind transfer learning, feature extraction, fine-tuning, and model evaluation. Part 2 covers the implementation of the Flask app and its subsequent deployment on Heroku. Please follow the tutorials in order for maintaining continuity. Code on Github.
Introduction
I have been fortunate enough to work in environments where (a) infrastructure and architecture for data generation was readily available, (b) data wrangling was handled by analysts, and (c) MLOps was handled by a separate division of data engineers. These perks have given me the freedom to focus on the thing I love the most — data modeling. Having said that, I always wanted to learn a few basics at the very least, if I ever had to do an entire project on my own. This is precisely the motivation behind this article.
I decided to implement an end-to-end DL project since there are a few challenges pertaining mainly to their deployments — due to the size of the models we must deal with — and fine-tuning of the model to suit our particular use case.
The project will consist of three parts:
- Part 1: Setup (virtual environment, training dataset, etc.), Model Training (fine-tuning with Keras, learning curve monitoring, etc.), Testing.
- Part 2: Building a Flask app and deployment on Heroku.
The aim of the two-part series is to provide you with source code, tips, tricks, and familiarity with common runtime errors when working with deep learning models. I am sure these will come in handy while explaining projects during data science interviews.
Headsup: Some of the stuff in this (and subsequent) article will be discussed in excruciating detail as the aim is for people (especially early-stage researchers) to understand the reasons/pros/cons behind some design decisions and answer them flawlessly if probed during interviews.
Part 1: Setup
Virtual Environment
Using the terminal, create a virtual environment called e2eproject
inside the project directory and activate it.
python3 -m venv e2eproject
source e2eproject/bin/activate
Dataset
We will be working with the publically available House Room Dataset from Kaggle.
You can download it manually and later move it into your project directory OR use the following command in the terminal to download it directly into your project directory.
P.S.: Make sure you are inside the project directory before running the following command.
kaggle datasets download -d robinreni/house-rooms-image-dataset — unzip
Task
We will be working on an image classification task. In particular, we will be developing a model that can detect whether a house interior is modern (class M) or old (class O) given an image of its bedroom. Such a model may find utility for property valuations during remortgaging or at the time of selling a property.
As you may have already noticed, the dataset is unlabelled, however, one of my friends generously offered to hand label ~450 images. (The labels have been provided in the Github repo.) Although this is not a substantial dataset size, we were still able to achieve almost 80% accuracy on a held-out test set. Additionally, appropriate techniques for fine-tuning, improving model metrics, etc. will be discussed to ascertain whether it is worth spending more time labeling additional data points.
Part 2: Model Training
Let’s create the model.ipynb
notebook.
Installations
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Modelfrom sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
from imutils import paths
from tqdm import tqdmimport matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import seaborn as sns
import numpy as np
import shutil
import os
Note: You might have to do a few pip install XXX
to get the above cell working.
Helper Variables & Functions
ORIG_INPUT_DATASET = "House_Room_Dataset/Bedroom"TRAIN = "training"
VAL = evaluation"
TEST = "testing"BASE_PATH = "dataset"
BATCH_SIZE = 32
CLASSES = ["Modern", "Old"]
We will only be working with the bedroom images, hence ORIG_INPUT_DATASET
points to the bedroom sub-directory. BASE_PATH
is the path to the directory where we will be storing the train, test, and validation splits for the images. This will be empty initially.
def plot_hist(hist, metric):
if metric == 'auc':
plt.plot(hist.history["auc"])
plt.plot(hist.history["val_auc"]) else:
plt.plot(hist.history["loss"])
plt.plot(hist.history["val_loss"]) plt.style.use("ggplot")
plt.title("model {}".format(metric))
plt.ylabel("{}".format(metric))
plt.xlabel("epoch")
plt.legend(["train", "validation"], loc="upper left")
plt.show()
This is some boiler-plate code for plotting two types of learning curves — AUC vs. epoch and loss vs. epoch.
Note: If you are working with a metric other than auc
, say accuracy
, make sure to update auc
with accuracy
and val_auc
with val_accuracy
in the code snippet above.
Loading labels
(labels.txt
has been made available as part of the repo.)
# Reading labels from the txt file
with open("labels.txt", 'r') as f:
manual_labels = f.read()# Extracting individual labels into a list
labels = [i for i in manual_labels]
len(labels)********* OUTPUT **********
451
To check whether the dataset is balanced:
from collections import Counterprint(Counter(labels).keys())
print(Counter(labels).values())********* OUTPUT **********
dict_keys(['O', 'M'])
dict_values([271, 180])
Looks like we have more old houses compared to modern ones in our dataset (although not by a very large margin). Hence, it makes sense to ditch accuracy and pick a metric that is more suitable to deal with class imbalance, namely AUC (Area Under ROC curve).
Train Test Validation Splits
Before we do the splitting, it’s important to sort the filenames because we have the labels for the first 451 images (in House_Room_Dataset/Bedroom
subdirectory) and not just any random 451 images. By default, os.listdir()
returns the files in some random order and we shouldn't rely on it.
# sorting files in the order they appear
files = os.listdir(ORIG_INPUT_DATASET)
files.sort(key=lambda f: int(f.split('_')[1].split('.')[0]))# checking to see the correct file order
files[:5]********* OUTPUT **********
['bed_1.jpg', 'bed_2.jpg', 'bed_3.jpg', 'bed_4.jpg', 'bed_8.jpg']
Now that we know we have the correct 451 images, let’s proceed to the train-test-validation splits. We will allocate ~ 75%, 15%, and 10% of the data for training, validation, and testing, respectively.
# splitting files into train and test sets
trainX, testX, trainY, testY = train_test_split(files[:len(labels)],
labels,
stratify=labels,
train_size=0.90)# further splitting of train set into train and val sets
trainX, valX, trainY, valY = train_test_split(trainX, trainY, stratify=trainY, train_size=0.85)# Checking the size of train, test, eval
len(trainX), len(trainY), len(valX), len(valY), len(testX), len(testY)********* OUTPUT **********
(344, 344, 61, 61, 46, 46)
Using Sklearn’s train_test_split()
method, we first split the entire dataset into train and test sets, followed by a second split of the train data into train and validation sets. It is important to stratify
by labels
because we want a proportional distribution of both modern and old images in all three sets — train, test, and validation.
Building the training dataset directories
Later on in the code, you’ll notice that during training we won’t be loading the entire dataset into memory. Instead, we will make use of Keras’s .flow_from_directory()
function to allow for batch processing. However, this function expects the data to be organized into directories as follows:
To get our image files organized in the above format, we will make use of this short snippet:
While the code snippet runs, you should be able to see the progress using the tqdm
module and once it finishes, you’ll find three new sub-directories created — dataset/training
, dataset/evaluation
, and dataset/validation
and within each of these, there will be two sub-sub-directories, one each for modern and old houses.
As a sanity check, let’s see we have the expected number of images in each subdirectory.
trainPath = os.path.join(BASE_PATH, TRAIN)
valPath = os.path.join(BASE_PATH, VAL)
testPath = os.path.join(BASE_PATH, TEST)totalTrain = len(list(paths.list_images(trainPath)))
totalVal = len(list(paths.list_images(valPath)))
totalTest = len(list(paths.list_images(testPath)))print(totalTrain, totalTest, totalVal)********** OUTPUT *******
344 46 61
Note: If your custom data is in the structure described below, there is a useful python package called split_folders that can be used to get the data in the directory structure defined in Fig 1.
dataset/
class1/
img1.jpg
img2.jpg
...
class2/
img3.jpg
...
...
Image Preprocessing
Because we are dealing with rather limited sample size, it is often recommended to randomly augment images using rotations, zooming, translations, etc.
While it might be tempting to think that data augmentations increases the amount of training data available, what it actually does is take a training sample and apply a random transformation to it [Source]. Overall, the sample size remains the same.
Keras allows random augmentations for brightness, rotation, zoom, shear, etc. using the ImageDataGenerator and the best part is that all this is done on the fly during model fit i.e. you need not compute them in advance.
Training data augmentation:
trainAug = ImageDataGenerator(
rotation_range=90,
zoom_range=[0.5, 1.0],
width_shift_range=0.3,
height_shift_range=0.25,
shear_range=0.15,
horizontal_flip=True,
fill_mode="nearest",
brightness_range=[0.2, 1.0]
)
Most parameters such as width_shift
, height_shift
, zoom_range
and rotation_range
should be intuitive (if not, have a look at the official Keras documentation).
An important thing to note is that when you perform, say zooming or rotation, some empty areas/pixels might be created in the image which must be filled using the appropriate technique mentioned in fill_mode
.
Validation data augmentation:
valAug = ImageDataGenerator()
You’ll observe that we have provided no arguments when initializing the data augmentation object for the validation data. This means we are going to use the default value for all of them, which is 0. In other words, we are not applying any augmentations (no zooming, width shifts, horizontal flips, etc.) to the validation set because this set should be treated as a test set when evaluating the model during training.
Testing data augmentation:
testAug = ImageDataGenerator()
Following the same logic as above, we are not applying any augmentations to the test set.
Creating data generators
As mentioned earlier, we need to create some data generators which will keep feeding these augmented images into batches to the model during training. To do so, we can use the flow_from_directory()
generator function.
# Create training batches whilst creating augmented images on the flytrainGen = trainAug.flow_from_directory(
directory=trainPath,
target_size=(224,224),
save_to_dir='dataset/augmented/train',
save_prefix='train',
shuffle=True
)# Create val batches valGen = valAug.flow_from_directory(
directory=valPath,
target_size=(224,224),
shuffle=True
)
Few important things to consider:
- In each case, the
directory
is set to the path where the training (or validation) images reside. - Specifying the
target_size
as(224,224)
ensures all images will be resized to this size. - We are also going to set
save_to_dir
as the path to the directory where we are going to save the augmented images (with the prefix specified insave_prefix
) that will be created on the fly during training. This provides a good sanity check to see if the images are getting randomly transformed as they should. Note: If you’d like to check this beforehand, i.e. before training begins, here’s a quick snippet I found on StackOverflow. - Finally,
shuffle
is set toTrue
because we want the samples to be shuffled within the batch generator so that when a batch is requested bymodel.fit()
, random samples are given. Doing so will ensure batches between epochs don’t look alike and will eventually make the model more robust.
# Create test batchestestGen = testAug.flow_from_directory(
directory=testPath,
target_size=(224,224),
shuffle=False
)
Other than setting the correct directory
path for testGen
, there is one main thing to consider:
shuffle
must be set toFalse
.
Why, you ask?
Because, now we don’t want the samples to be shuffled within the test batch generator. Only when shuffle is set to False, will the batches be created in the same order as the filenames provided. This is needed to match the filename (i.e. true labels, accessible using testGen.classes
) with the predicted labels during model evaluation.
Fun fact: If you check the output of trainGen.classes
right now hoping that they would be shuffled, you would be disappointed. Why? Because the shuffling happens on-the-fly when a batch is requested during the time of model fitting. [StackOverflow].
Intuition behind training process
We could have trained a model from scratch but that is bound to underperform — mainly because we have such a small dataset. In such scenarios, it makes sense to harness the power of transfer learning.
Transfer learning refers to the process of fine-tuning a pretrained model on a new dataset. This enables it to recognize classes it was never trained on!
In a nutshell, transfer learning allows us to leverage the knowledge a model gained during training to recognize dogs from cats, such that it can now be used to predict whether a house interior is modern or not.
But why does it work?
Because any base model we pick (i.e. the pretrained model) is usually trained on such a large corpus of images, it is capable of learning good vector representations of images, in general. All that is left to do is use these representations when distinguishing between custom classes (in our case, old vs modern houses).
Acknowledgment: I would like to take a moment and give a big shoutout to a few blogs (this, this, and this) that I found during the research phase for this article. These turned out to be true gems and helped me understand the concept of transfer learning in detail. I truly appreciate all your insights which have allowed me to simplify the code/explanations for my readers.
Transfer Learning using Keras
There are two main steps involved in transfer learning:
- Feature Extraction: Take a pre-trained model (and freeze its weights) as the base model and then train a new classifier* on top such that it outputs exactly N values (where N is the number of classes).
- [optional] Fine Tuning: Once the classifier is trained, unfreeze a few** layers from the base model so that it adapts well to the new dataset.
*The new classifier can be:
- a stack of Dense layers (i.e. fully connected layers).
OR
- a single global pooling layer (downsize the entire feature map to a single value —
maxpool
,avgpool
). This is preferred because there is less overfitting as there are 0 parameters to optimize (and hence our choice for this article).
** A few can vary depending on how different your dataset is from the one the pre-trained model was initially trained on. Keep in mind that if the two datasets are quite similar, then it may be beneficial to only unfreeze a fraction of all layers.
The fine-tuning step, although optional, is quite crucial for use cases where your custom dataset is quite different from the dataset on which the base model was trained. Also, this may require more epochs compared to the feature extraction step. Because more epochs roughly translate to higher chances of overfitting, it is recommended to use early stopping (of model training) after careful monitoring of the loss/accuracy curves.
Intuition behind model selection
Coming to the million-dollar question — which model should we select as the base model for fine-tuning? Clearly, there are quite a few options available, as can be found on the Keras documentation here. While my initial choice was ResNet-50 due to its popularity, I finally decided to proceed with EfficientNet due to the fact that they can achieve similar results as SOTA models while requiring fewer FLOPS. Also, the paper mentions that their performance is at par with SOTA models on transfer learning tasks whilst requiring 9.6x fewer parameters on average. Wohoo ⭐️
There are quite a few flavors of the EfficientNet models (EfficientNetB0, EfficientNetB1, …… EfficientB7) and they differ slightly in architecture (i.e. network depth, width) and resource limitations. Each of these models expects images in a particular image shape as described in this table. Given we are working with 224x224
resolution images, we will go with EfficientNetB0.
Model training for Feature Extraction step
Note: We will be using Tensorflow’s Keras API for this tutorial. If you are new to Keras, I have already written two beginner-level Keras tutorials (Part1, Part2) that cover network architecture, neurons, activation functions, hidden layers (Dense, Dropout, MaxPool, Flatten), etc in much more detail than would be discussed here. Feel free to refer them for a quick refresher!
We begin with creating an EfficientNetB0
base model using imagenet
weights.
baseModel = EfficientNetB0(
weights="imagenet",
include_top=False, # make sure top layer is not included
input_tensor=Input(shape=(224, 224, 3)),
pooling="avg"
)
Few things to consider:
include_top
must be set toFalse
because the top layer (i.e. the final layer) in the EfficientNet network architecture is aDense
layer that outputs 1000 classes corresponding to the ImageNet dataset. We clearly don’t need this!- if you remember correctly, we opted for the new classifier to be a global pooling layer (instead of a stack of dense layers). Well, the good news is that Keras API already allows us to do this whilst instantiating the
EfficientNetB0
object. We can simply set thepooling
parameter asavg
. The default isNone
.
The next step is to freeze the weights by setting trainable
for each layer as False
:
# freeze the weightsfor layer in baseModel.layers:
layer.trainable = False
Now it’s time to create a new classifier on top which will spit out exactly two classes (M
or O
). To do so, we need to make sure the final layer of this classifier model is a Dense
layer with two output neurons. In between, we have also included BatchNormalization
and Dropout
layers for regularization.
# training a new classifier on top (Functional Keras Model)x = baseModel.output
Layer_1 = BatchNormalization()(x)
Layer_2 = Dropout(0.5)(Layer_1)
output_layer = Dense(len(CLASSES), activation="softmax")(Layer_2)model = Model(inputs = baseModel.input, outputs = output_layer)
Note: There are two ways to build this Keras classifier model: sequential (most basic one) and functional (for complex networks with multiple inputs/outputs). The code snippet above is written as a functional network because it lends more clarity to the network architecture if you were to check it using model.summary()
. Likewise, we could have created a Sequential model like the one below and the results would be the same.
# Another way to create the classifier on top of basemodelmodel = tf.keras.Sequential()
model.add(baseModel)
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(len(CLASSES), activation="softmax"))
Finally, let's compile the model with Adam
optimizer and a relatively large learning_rate = 1e-3
. Since we have two possible output classes, we will be monitoring the binary_crossentropy
loss (use categorical_crossentropy
if you're dealing with more than two classes) and assessing the model usefulness based on the AUC metric implemented in tf.keras.metrics.AUC
.
# compileopt = Adam(learning_rate=1e-3)
model.compile(optimizer=opt,
loss='binary_crossentropy',
metrics=[tf.keras.metrics.AUC()]
)
One last thing to do before training the model using fit()
is implementing EarlyStopping
and ModelCheckpoint
.
The former will ensure that the model does not train for more epochs than necessary. This is done by monitoring val_loss
and as soon as there are no further improvements i.e. it can’t be minimized further, training is stopped.
The latter will save the best model at the given file path — in our casefeature_extraction.h5
. We are again going to monitor validation loss and save the best model from all epochs.
Note: Here’s an excellent article explaining both EarlyStopping and ModelCheckpoint implementation in more detail!
# implementing early stopping
es = EarlyStopping(
monitor='val_loss', #metric to monitor
mode='min', # whether to min or max the metric monitored
patience=10, # epochs to wait before declaring stopped training
verbose=1 # output epoch when training was stopped
)# implementing model checkpoint
mc = ModelCheckpoint(
'feature_extraction.h5',
monitor='val_loss',
mode='min',
verbose=1, # display epoch+accuracy everytime model is saved
save_best_only=True
)
Finally, it’s time for model training:
# Training the modelhist = model.fit(
x=trainGen,
epochs=25,
verbose=2,
validation_data=valGen,
steps_per_epoch=totalTrain // BATCH_SIZE,
callbacks=[es, mc]
)
Taking a quick look at the AUC and loss curves, we can find evidence for model convergence (meaning the model is ready for the fine-tuning step).
One of the interesting observations from the graph on the right was that our validation loss was lower than training loss. At first, I thought there was some data leakage issue but then I found this excellent article that explained why this is totally normal and can sometimes happen during training.
To summarize two possible reasons (from the article itself):
- Reasons #1: Regularization (such as Dropouts) is applied only during training and not during validation. Since regularization sacrifices training accuracy to improve validation/test accuracy, validation loss can go lower than train loss.
- Reason #2: Our validation set is too small (only 61 images) and perhaps it was too easier than the training set, i.e. unrepresentative validation dataset.
Model Testing after Feature Extraction Step
We are going to use some boilerplate code for evaluating the model predictions obtained using .predict()
. Bear in mind that the predIdxs
will be something like [0.8, 0.2]
i.e. the softmax value for both class M
andO
, so make sure you pick the maximum of the two using np.argmax
. We use testGen.class_indices
to check the mapping from class names to class indices.
testGen.reset()predIdxs = model.predict(
x=testGen,
steps=(totalTest // BATCH_SIZE) + 1
)predIdxs = np.argmax(predIdxs, axis = 1)
print("No. of test images", len(predIdxs))
print(testGen.class_indices)cm = confusion_matrix(testGen.classes, predIdxs)
heatmap = sns.heatmap(cm, annot=True)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()********* OUTPUT********
No. of test images 46
{'M': 0, 'O': 1}
Model Training for Fine-Tuning step
We will begin by unfreezing the last few layers of the current model, however, one shouldn’t go about randomly turning layers on or off. There are numerous techniques and tips available for fine-tuning models (see this and this as examples) but there were some that I found most useful:
- When compiling the model at this step, use an even smaller learning rate compared to the feature extraction step. A smaller learning rate means more epochs will be needed since smaller changes will be made to the network weights on each update.
- The
BathcNormalization
layers need to be kept frozen. - Within a network architecture, a convolution block needs to be turned on or off in its entirety.
For instance: consider some of the last rows of the output frommodel.summary()
. As you can see, the layers are neatly organized into blocks withblock7d
as the final block. As a starting point, we will unfreeze all layers inblock7d
(anyBathcNormalization
layers, however, will be left as is) plus the 7 following layers (most of which were defined by us when we built a new classifier head). In total, the last 20 layers from the network would be candidates for unfreezing.
____________________________________________________________________
Layer (type) Output Shape Param # ====================================================================
.
.
.block6d_project_conv (Conv2D) (None, 7, 7, 192) 221184 ____________________________________________________________________block6d_project_bn (BatchNormal (None, 7, 7, 192) 768 ____________________________________________________________________block6d_drop (Dropout) (None, 7, 7, 192) 0 ____________________________________________________________________block6d_add (Add) (None, 7, 7, 192) 0 ____________________________________________________________________block7a_expand_conv (Conv2D) (None, 7, 7, 1152) 221184 ____________________________________________________________________block7a_expand_bn (BatchNormali (None, 7, 7, 1152) 4608 ________________________________________________________________block7a_expand_activation (Acti (None, 7, 7, 1152) 0 ____________________________________________________________________block7a_dwconv (DepthwiseConv2D (None, 7, 7, 1152) 10368 ____________________________________________________________________ .
.
.
I have bundled the code for fine-tuning into a function called fine_tune_model()
. Most of the code is repeated from the feature extraction step.
def fine_tune_model(model): # unfreeze last conv block i.e. block7a
for layer in model.layers[-20:]:
if not isinstance(layer, BatchNormalization):
layer.trainable = True # check which of these are trainable and which aren't
for layer in model.layers:
print("{}: {}".format(layer, layer.trainable)) # compile (with an even smaller learning rate)
opt = Adam(learning_rate=1e-5)
model.compile(
optimizer=opt,
loss='binary_crossentropy',
metrics=[tf.keras.metrics.AUC()]
) return modelmodel_fine_tuned = fine_tune_model(model)
Because fine-tuning will also make use of the same data generators i.e. trainGen
,valGen
, and testGen
, it is important to reset them so they start with the very first sample in the dataset.
trainGen.reset()
valGen.reset()
testGen.reset()
Finally, let’s set the early stopping and model checkpoint (notice we have increased patience
to 20 as we are now going to train for longer i.e. 50 epochs
) and get the training started.
# implementing early stopping
es_tune = EarlyStopping(
monitor='val_loss',
mode='min',
patience=20,
verbose=1
)# implementing model checkpoint
mc_tune = ModelCheckpoint(
'fine_tuned_house.h5',
monitor='val_loss',
mode='min',
verbose=1,
save_best_only=True
)hist = model_fine_tuned.fit(
x=trainGen,
steps_per_epoch=totalTrain // BATCH_SIZE,
validation_data=valGen,
epochs=50,
verbose=2,
callbacks=[es_tune, mc_tune]
)
Model Testing after Feature Extraction Step
Upon comparing it with the previous confusion matrix, we have only managed to increase the number of correctly predicted images by 2 (see diagonal values in both heatmaps).
As a final sanity check, it’s also good to see whether this fine-tuning step shows any signs of overfitting.
The validation loss is not volatile and is stable at around 0.55, indicating the model has not been overfitted. In general, the AUC of the validation set predictions does get better with more epochs but with diminishing returns. (In simpler words, doesn't seem like training for longer would help our case substantially).
At first, I thought the fluctuations in the training curve were due to batch size since they play a role in how the network learns. Similarly, a too-large learning rate can deter convergence and cause the loss function to fluctuate and get stuck in local minima. However, neither increasing batch size nor decreasing learning rate helped in smoothing the gradient.
Another possible explanation that comes to mind is that the network has reached its capacity with respect to the given dataset i.e. it can learn no more from it. This is possible since we are trying to train a relatively large network (remember we have unfrozen some additional layers meaning more trainable parameters exist) using only 344 samples which are unable to provide sufficient information to learn the problem (any further).
Note: Before shoving more images into the training process in the hopes of improving the model, it might be worth tinkering around with model hyperparameters, train:val split, choice of pre-trained weights (weights from noisy student training are known to be better than those from ImageNet training), and the network architecture itself.
Future Work
It has been established in this recent paper and described in this video that joint training with both unlabelled and labeled datasets outperforms the pipeline wherein we first pre-train with unlabelled data and then fine-tune on labeled data. This is known as semi-supervised learning and will be the focus of our next tutorial. This would allow us to make full use of the remaining images in our dataset for which it was difficult to obtain labels.
Kudos for sticking around so long. 🥂
Head over to Part 2 for learning how to take this trained model and wrap it within a flask app. We are also going to write a quick and dirty front end and finally deploy the app on Heroku.
Until then :)
I enjoy writing step-by-step beginner’s guides, how-to tutorials, interview questions, decoding terminology used in ML/AI, etc. If you want full access to all my articles (and others on Medium), then you can sign up using my link here.