The world’s leading publication for data science, AI, and ML professionals.

Dog breed classification using Deep Learning concepts

Developing ideas for a dog identification app

derivative work: Djmirko (talk), YellowLabradorLooking new, neural transfer learning, CC BY-SA 3.0
derivative work: Djmirko (talk), YellowLabradorLooking new, neural transfer learning, CC BY-SA 3.0

This blog post is part of the Udacity Data Scientist Nanodegree program.

Introduction

The World Canine Organization (FCI) is currently listing more than 300 officially recognised dog breeds. Over thousands of years, mankind has managed to create an impressive diversity of canine phenotypes and an almost uncanny range of physical and behavioural characteristics of their faithful four-legged friends. However, apart from cynology scholars, dog breeders and some proven dog lovers most people shrug their shoulders in a clueless gesture, when asked to name the breed of a randomly presented dog, at least when it is not exactly a representative of one of the most popular and well known breeds like Dachshund, German Shepard or pug. If you are one of the few people who finds it slightly embarrassing not being able to identify dogs like a cynologist, you are probably pleased to learn that there might be a technical solution. Because thankfully, the aspiring and astonishing field of Deep Learning and artificial neural networks provides powerful concepts and methods for addressing this sort of classification tasks.

In this project we will develop ideas for a dog identification app using deep learning concepts. The software is intended to accept any user-supplied image as input. If a dog is detected in the image, it will provide an estimate of the dog’s breed. If a human is detected, it will provide an estimate of the dog breed that is most resembling.

Our project involves the following steps which will be covered in detail in the subsequent sections of this blog post.

  • Step 0: Import Datasets
  • Step 1: Detect Humans
  • Step 2: Detect Dogs
  • Step 3: Create a CNN to Classify Dog Breeds (from Scratch)
  • Step 4: Use a CNN to Classify Dog Breeds (using Transfer Learning)
  • Step 5: Create a CNN to Classify Dog Breeds (using Transfer Learning)
  • Step 6: Write your Algorithm
  • Step 7: Test Your Algorithm

Step 0: Import Datasets

Obviously, to be able to build an algorithm intended to identify dogs we will need some "dog data". A lot of it. Thankfully, for this project Udacity is providing a decent number of dog images including the corresponding breed labels. Concretely, the image data comprises 8351 dog images and 133 separate dog breed names.

Since the app has the additional task to assign the most resembling dog breed to a given human face, we also need a dataset with human faces. The dataset provided by Udacity includes 13233 images from the labeled faces in the wild dataset.

Step 1: Detect Humans

This seems to be a somewhat surprising step in the development of a dog identification app, but it is necessary for its extra job to assign the most resembling dog breed to a given human face.

In order to detect human faces in images we will use OpenCV’s implementation of Haar feature-based cascade classifiers. The approach of this classifier is based on the concept of Haar-like features, which is widely used in the field of object recognition because of its convincing calculation speed.

After instantiating a new pre-trained classifier, an image is loaded and converted to grayscale. Applying the classifier to the image gives us the bounding boxes of detected human faces.

Here are a few examples from the labeled faces in the wild dataset used in this project after running them through our cascade classifier:

image source: Labeled faces in the wild , http://vis-www.cs.umass.edu/lfw/
image source: Labeled faces in the wild , http://vis-www.cs.umass.edu/lfw/

Assessing the human detector

Let’s now take a look at how the classifier performs with the pictures from our datasets. We apply the algorithm to 100 of our dog images and are curious to explore the 12 pictures in which the classifier interestingly identified human content. We are slightly disappointed to hardly find any bizarre and uncanny human features in the depicted dogs’ faces that could have fooled our algorithm

image source: dog image dataset provided by Udacity
image source: dog image dataset provided by Udacity

Conversely, the classifier missed human faces in the following 2 of the 100 samples of human pictures used in our assessment:

image source: Labeled faces in the wild , http://vis-www.cs.umass.edu/lfw/
image source: Labeled faces in the wild , http://vis-www.cs.umass.edu/lfw/

Still, our classifier seems reliable enough to give it a go in our project.

Step 2: Detect Dogs

Now that we have a pretty decent algorithm to detect human faces in images we surely want to build a similar function for dog detection. Unfortunately, at the moment there is no comparable "dog detector" available for OpenCV’s CascadeClassifiers. Therefore, we choose another approach by employing an image classification model which has been pre-trained on the vast image database of ImageNet. More specifically, we will use the high-level deep learning API Keras to load the ResNet-50 convolutional neural network and run images through this model. For a specific image the network predicts probabilites for each of 1000 image categories in total. We attribute a positive dog detection to an image, if the model assigns the maximum probability to one of the 118 dog related categories.

The source code below lists the functions used to preprocess the image data and run them through ResNet-50 model.

Assessing the dog detector

How does the ResNet-50 dog detector perform on our image datasets? We are going to test this with the source code below.

We obtain a compelling 100% accuracy with our dog images, but Frank Solich might be worried that the only dog, that one of the most groundbreaking deep learning network models spotted in a human image dataset, was in a portrait of him:

Frank Solich, image source: Labeled faces in the wild , http://vis-www.cs.umass.edu/lfw/
Frank Solich, image source: Labeled faces in the wild , http://vis-www.cs.umass.edu/lfw/

Step 3: Create a CNN to Classify Dog Breeds (from Scratch)

Now we will come to the really interesting part and tackle the implementation of the app’s principal task to tell the correct dog breed label from an image of a dog. We could make things easy and just use the pre-trained model from step two and predict the dog breed labels defined in the categories of the ImageNet dataset. But of course it’s much more exciting, interesting and educational to build our own solution, so here we go! Before we start building our own classifier, a few words about convolutional neural networks.

Convolutional neural networks (CNNs) are a class of deep neural networks primarily used in the analysis of images. To a certain extent, the design of convolution networks was inspired by the way in which a mammal’s brain processes visual impressions. Translation invariance and shared weights are mostly cited to explain the advantages of CNNs over using other types of neural networks in image analysis. The architecture of a convolution network involves the use of multiple hidden layers that perform mathematical convolution operations on their input.

First try

A typical example of the structure of a CNN is provided by Udacity, which suggests the following model for use in the present step.

So we have an input layer into which image data are fed and a total of three pairs of convolutional and pooling layers before a fully connected "dense" layer produces an output. Convolutional layers consist of a set of filters of a certain height and width, while the pooling layers have the task of reducing the dimension of the input. Typically, the number of filters in each convolutional layer increases while the dimensionality of the processed data decreases. Because the performance of a model typically increases with increasing depth, we add two additional stages to the model proposed by Udacity.

The source code for creating our model with the Keras library looks like this:

Before producing an output we insert an additional dropout layer which randomly deactivates some of the neurons. The use of dropout layers is a common regularisation method to prevent overfitting of the training data.

Finally, our first model looks like in the following graph:

CNN from scratch (plotted with Netron)
CNN from scratch (plotted with Netron)

Let’s now train our model by running our training set through our network 30 times:

Let’s take a look at the progress our training made in each epoch:

We can see an almost linear increase in accuracy, and by the end of the training we obtain roughly 23 % with our training set with rather moderate overfitting effects, meaning the accuracy for our validation set is clearly lagging behind but not that much. Our additional test with the test data gives an accuracy of 16.5 %.

Not bad, but certainly not accurate enough for a serious application, so let’s see if we can do any better.

Second try with AlexNet

Now we’re going to try again with a true classic in the world of CNN models. AlexNet is a CNN model that outclassed its competitors in the 2012 ImageNet Large Scale Visual Recognition Challenge and introduced some groundbreaking new concepts, such as the ReLU activation function and the use of dropout layers to prevent overfitting. The general structure of the model is as follows:

AlexNet (plotted with Netron)
AlexNet (plotted with Netron)

Let’s implement an AlexNet model with the Keras library:

As in the first attempt, we run our training data 30 times through our network and achieve the following progress.

So, wow, the accuracy of the training set is literally shooting upwards toward the end beating the results from our first attempt, but, oh dear, what is happening with the validation curve??? We are clearly dealing with an overfitting problem.

3rd try. Tackling overfitting with data augmentation

In addition to using dropout layers, we can use another popular method to get our problem under control. Data augmentation is a a technique to increase the diversity of the training set by applying random transformations such as image rotation, image shifting, varying image brightness and image flipping. So let’s try this method in our next attempt:

Ok, and now let’s check the progress history:

Yep, that looks much better now, even if avoiding the overfitting issue was clearly at the expense of the accuracy levels, so that we only achieve 10.9 % with our test dataset. But if we compare the trajectory of the two graphs from our first and third attempt, AlexNet seems to have a steeper curve and could soon overtake the model from our first try in additional epochs.

But all in all, the approach using a CNN model, that we build from scratch, seems to be quite complex, tedious and time-consuming, which requires a lot of patience and a lot of computing power. So, let’s look at a better method in the next step.

Step 4: Use a CNN to Classify Dog Breeds (using Transfer Learning)

The general idea behind ​​transfer learning is the fact that it is much easier to teach specialized skills to a subject that already has basic knowledge in the specific domain. There are a lot of neural network models out there that already specialize in Image Recognition and have been trained on a huge amount of data. Our strategy now is to take advantage of such pre-trained networks and our plan can be outlined as follows:

  • find a network model pre-trained for a general image classification task
  • load the model with the pre-trained weights
  • drop the "top of the model", i. e. the section with the fully connected layers, because the specific task of a model is generally defined by this part of the network
  • run the new data through the convolutional part of the pre-trained model. (this is also called feature extraction and the output of this step is also called bottleneck features.)
  • create a new network to define the specific task at hand and train it with the output (the bottleneck features) of the previous step.

As we will see in a moment, the structure of the model into which we stuff the bottleneck features can usually be quite simple because a large part of the training work has already been done by the pre-trained model. In step 4 of this project Udacity is providing some kind of blueprint for this strategy by having already fed our images dataset into a pre-trained VGG16 model (another classic in the field of CNN models for image classification) and making available the output as bottleneck features, which we can now feed into a very simple training network that essentially consists of just one global average pooling layer and a final dense output layer.

The following source code loads the bottleneck features, defines the top layers for our specific classification task, and trains these new layers with the bottleneck features:

Again, let’s take a look at the progress history:

Apart from the rapid training speed we observe a remarkable performance in terms of accuracy and achieve about 75 % with our test data, albeit at the expense of an obvious overfitting problem.

Step 5: Create a CNN to Classify Dog Breeds (using Transfer Learning)

We will now take step 4 as a template and define our own CNN using transfer learning. We choose InceptionV3 as the network that should provide us with the features for our training layers. Inception is another high performing model on the ImageNet dataset and its power lies in the fact that the network could be designed much deeper than other models by introducing subnetworks called inception modules.

The source code looks quite similar to the one from step 4:

To prevent the overfitting issues we observed in step 4, we inserted an additional dropout layer and added batch normalization before the output layer.

As we can see from the progress history, we still have some overfitting problems, but we also notice another improvement of accuracy. We achieve 83 % accuracy with our test data.

To make our model even better, we could consider the following options:

  • using data augmentation to prevent overfitting
  • adding layers to our simple training model
  • getting more training data

But for now, we are quite satisfied with the results from our latest attempt and use them in the algorithm we are going to write and test in the following steps.

Step 6: Write your Algorithm

So, let’s now collect the achievements and findings from the previous steps and write an algorithm that takes an image of a dog or a human und spits out a dog breed along with 4 sample images of the specific breed.

Step 7: Test your Algorithm

Finally, let’s test our algorithm with a few test images.

Doberman Pinscher, image source: dog image dataset provided by Udacity
Doberman Pinscher, image source: dog image dataset provided by Udacity
Collie, image source: dog image dataset provided by Udacity
Collie, image source: dog image dataset provided by Udacity
image source: dog image dataset provided by Udacity
image source: dog image dataset provided by Udacity

Conclusion

In this project we developed several approaches for the development of an app for the identification of dog breeds, and we achieved our best results with the application of a transfer learning model. We obtained an accuracy of 83% in our tests. We also learned how to build convolution networks from scratch, which was a very educational undertaking, even though we soon realized that there are significantly more promising methods, particularly with the application of transfer learning.

However, we still see several options to further improve our algorithm in the future:

  • We could gather more training data.
  • We could employ data augmentation to prevent overfitting.
  • We could add more layers to make our model more complex and hopefully more powerful.
  • We could extend our training time and add more epochs to the training.

But all in all, the accuracy levels from our tests, along with the tests with specific sample images, suggest that we already have a serious model we could work with in a real app.


The source code for this project was written in Python in a Jupyter Notebook and is making use of the popular deep learning libraries TensorFlow and Keras. You can find it in the corresponding github repository.

Some of the source code has been provided by Udacity in this repository.


Related Articles