The world’s leading publication for data science, AI, and ML professionals.

Hot dog or Not Hot dog

Trying out well-known CNN models with Tensorflow 2 to help Jian Yang

Output Image
Output Image

Introduction

Have you watched the "Silicon Valley" comedy series of HBO? If so, I bet you remember the Not Hotdog app that Jian Yang developed. Here is a clip to refresh your memory.

So basically this app identifies whether something is Hot dog or not. Well, we can train with other types of objects to identify them as well.

When I learned about CNN (Convolutional Neural Network), I was eager to try out some popular CNN models on this problem, just for fun. So I choose some of the best CNN models to try.

I have used the following models for this problem,

  1. Variation of AlexNet
  2. Transfer learning with VGG19
  3. Transfer learning with ResNet50
  4. Transfer learning with Inception V3
  5. Transfer learning with Inception ResNet V2

You can view the notebook on NBViewer and find it on GitHub too,

case-studies/02. Not Hotdog at main · scsanjay/case-studies

The Data

Without good quality data, there is no machine learning. Thankfully, I found a dataset on Kaggle[1].

There are a total of 3000 images available for training. Out of which 1500 are hot dog images and 1500 are not hot dogs (they are food, furniture, people, or pets). 20% of the training data will be used for validation that means 600 images will be used for validation. The test set has 322 images from Hot Dog category and 322 images from Hot Dog category.

While loading I have resized all images to 256×256. Also, I have batched them into batch size of 32.

Let’s see some of the images,

Data Sample
Data Sample

Preprocessing

Resizing

One step of preprocessing to convert all images to the same size has been done already.

Data Augmentation

Data augmentation is a very useful step. It helps in making model to generalize better. Also, it generates new images from the given image which increases size of our dataset.

How? It can perform various operations like flipping, rotation, shear, zoom, etc to create augmented data. Note that TensorFlow does this on the fly that means we don’t have to save the images but they will be generated at the training time.

I have performed the following data augmentation operation, a) Horizontal Flip b) Rotation c) Zoom

After data augmentation, we can expect something like shown below for an image,

Augmented Data
Augmented Data

Rescaling

We should rescale the pixels from [0, 255] to [0, 1]. I will apply this only to AlexNet.

In case of transfer learning with pre-trained models, we will use the preprocessing of the respective models only.

Model Building

AlexNet

AlexNet used three major concepts,

  1. Data augmentation – to increase the variance in the data.
  2. Dropout – to deal with overfitting.
  3. ReLU activation unit – to deal with vanishing gradient problem.

I have created a variation of AlexNet architecture with Dropout and BatchNormalization here and there. There are a total of 58,286,465 trainable params. The last layer has 1 sigmoid activation unit.

I will use optimize with Adam optimizer based on Binary Crossentropy. And we will keep track of accuracy too.

I have run it for 10 epochs. I also added an Early Stopping callback which monitors the val_loss. We are getting 69.00% accuracy with validation data and 68.47% accuracy with test data. Which is not very encouraging. We could achieve more if we did tuning or had more data.

Transfer learning with Pre-Trained VGG19

Transfer learning

Transfer learning is a very cool technique. In transfer learning, we load a pre-trained model. And we remove the top (last few dense layers) of the model. Then we freeze the convolutional base of the pre-trained model. Now we can use this pre-trained model as a feature extractor.

In transfer learning sometimes we may also use the pre-trained model for weight initialization and then train the whole model. This usually happens when we have large data and or the pre-trained model is not trained on similar data.

VGG19

VGG19 is from Visual Geometry Group. It has 19 layers. It had a new idea that when we use multiple small kernels instead of less large kernels the number of trainable params decreases. Also it usage the same 3×3 sized kernels and 2×2 max pool everywhere which simplifies the architecture.

I have loaded TensorFlow Keras pre-trained VGG19 model with ImageNet weights and without the top layers.

Then I have set this base model as not trainable.

In the model architecture,

i) I have first added the input layer. ii) Then the data augmentation layer. Tensorflow will automatically manage that the augmentation runs only for training. iii) Then I have used the preprocessing of the VGG19 model. iv) After that I have used the base model i.e. the pre-trained VGG19 model. Which is not trainable as per the transfer learning. v) Then I have used the Global Average Pooling 2D layer. vi) Then a Flatten layer. vii) Then a Fully connected Dense layer with 1000 Relu activation units. I have also added Dropout to it for regularization. viii) Then I have added a linear activation unit. Because I will use from_logits=True**** in Binary Crossentropy to get the probabilities.

It has 514,001 trainable params and 20,024,384 non-trainable params.

Same as earlier we have used Adam optimizer and Binary cross-entropy.

This time we are getting 93.17% accuracy with validation data and 93.47% accuracy with test data. Which is a very good improvement from before.

Transfer learning with Pre-Trained ResNet50

The ResNet50 was created by He et al. It has 50 layers. It introduced the idea of residual blocks. And it helped in building models with large depth. The residual blocks have skip connections, so if a block is not useful then it is just ignored.

We have followed the same steps as above to create the same structure. The only difference is the base model which is ResNet50 now and the preprocessing is of ResNet50.

It has 2,050,001 trainable params and 23,587,712 non-trainable params.

The compilation and fitting stage is similar to the above. And this time, we are getting 94.33% accuracy with validation data and 94.56% accuracy with test data. Which is just a little bit of improvement from before.

Transfer learning with Pre-Trained Inception V3

Inception v3 is the third version. It has been developed by Google. It has 48 layers. The Inception network instead of using one convolution at a time, used all 1×1, 3×3, 5×5, and MaxPool at the same time. The idea was that the smaller kernel will get the local info and the larger kernel will get global info. It has one more trick called bottleneck layers which reduces the number of computations dramatically.

Again we have changed the base model to Inception V3 and also the preprocessing step.

It has 2,050,001 trainable params, same as ResNet50 and 21,802,784 non-trainable params.

The compilation and fitting stage is similar to the above. We are getting 92.67% accuracy with validation data and 94.40% accuracy with test data. Which is less than what we got with ResNet50.

Transfer learning with Pre-Trained Inception ResNet V2

Inception ResNet V2 has been developed by Google. They added skip connections of ResNet in Inception network. And it allowed to create a deeper network of 164 layers.

I have done similar changes as above.

It has 1,538,001 trainable params and 54,336,736 non-trainable params.

The compilation and fitting stage is similar to what we have seen till now. We are getting 95.33% accuracy with validation data and 96.42% accuracy with test data. Which is the best till now.

Test Outputs

All four transfer learning models are performing similarly and pretty well.

However, the Inception ResNet V2 has the highest accuracy in our case. So we will use it to show some outputs with test images.

Test Output
Test Output

Final Thoughts

If you see we have done only a couple of things for the model, data augmentation, pre-processing, loaded base models, and added few layers at top. And we were able to achieve an accuracy of 96.30%. This is because of the transfer learning. We could have made some more layers trainable in the base model. Which could have improved the accuracy.

It was very fun working on this project. And a great learning experience because it was my first Deep Learning project.

Deep learning is evolving very fast. Every day there is some new research papers get published. And only a few of them make the productions. Deep learning is all about being clever. In all of the models, we saw above, they introduced some new clever techniques.

References

  1. TensorFlow [https://www.tensorflow.org/]
  2. Dataset: Jain, Yashvardhan . 2019 . Hotdog-Not-Hotdog . Version 1 . License CC0: Public Domain . Available at: https://www.kaggle.com/yashvrdnjain/hotdognothotdog/metadata

  3. Applied Roots [https://www.appliedroots.com/]

Thanks for reading the blog! You can reach me on my LinkedIn Profile. Also 👏 if you liked it.


Related Articles