Image Classification in Data Science

What is image classification in Data Science and building our own image classifier in Python

Jason Dsouza
Towards Data Science

--

Welcome to part 2 of the “Data Science with Python” mini-series! In this lesson, we’ll talk about what image classification is and train an image classifier which can recognize cats from dogs (and vice-versa) with sizable accuracy.

What is Image Classification?

Photo by Joey Banks on Unsplash

Consider the image above. I’m sure we’d all agree that this is a car. But take a step back and analyze how you came to this conclusion — you were shown an image and you classified the class to which it belonged (a car, in this instance). That, in a nutshell, is what image classification is all about.

There are potentially a countless number of categories in which a given image can be classified (for example, a car can be classified into a sedan, hatchback, SUV etc. and they, in turn, can be classified into an Audi, a Maserati or even a Toyota). Manually checking and classifying images is a very tedious process and it becomes nearly impossible when you’re faced with a massive number of images, say 10,000 or even 100,000.

Image classification is the process of taking an input (like a picture) and outputting a class (like “cat”) or a probability that the input is a particular class (“there’s a 90% probability that this input is a cat”).

You can look at a picture and know that you’re looking at a cat, but how can a computer learn to do that?

With a convolutional neural network!

I’ve already written a post on CNNs — it’s worth a read especially if you’re new to Data Science.

As it turns out,

Computers don’t see images as humans do. They see a matrix of pixels, each of which has three components: red, green and blue (ever heard of RGB?)

Therefore, a 1,000-pixel image for us will have 3,000 pixels for a computer. Each of those 3,000 pixels will have a value, or intensity, assigned to it. The result is a matrix of 3,000 precise pixel intensities, which the computer must somehow interpret as one or more objects.

For a black and white image, pixels are interpreted as a 2D array (for example, 2x2 pixels). Every pixel has a value between 0 and 255. (Zero is completely black and 255 is completely white. The greyscale exists between those numbers.) Based on that information, the computer can begin to work on the data.

For a colour image (a combination of red, green and blue), this is a 3D array. Each colour has its value between 0 and 255. The colour can be found by combining the values in each of the three layers.

Setting up the Structure of our Image Data

In data science, the image classifiers we build have to be trained to recognize objects/patterns — this shows our classifier what exactly (or approximately) what to recognize.

Imagine I show you the following image:

Photo by Luca Bravo on Unsplash

There is a boathouse, a little boat, hills, a river and other objects. You’ll notice them, but you won’t know exactly what object I’m referring to. Now, if I show you another photo:

Photo by Benjamin Voros on Unsplash

and another:

Photo by Kurt Cotoaga on Unsplash

You’ll recognize that I want you to recognize a “Mountain”. By showing you the first two images, I essentially trained your brain to recognize a mountain in the third image.

The process is (almost) the same for computers — our data needs to be in a specific format. We’ll see this in action later in the article, but just keep these pointers in mind.

Our model will be trained on the images present in a “training ”set and the label predictions will happen on the testing set images.

Building our model

For this article, I’ll be using Google Colab — a free cloud service based on Jupyter Notebooks that supports free GPU.

Colab provides a GPU and it’s totally free. Seriously!

The entire code in this article can be found here.

Importing packages & Loading Data

In this article, we’ll utilize a filtered version of Dogs vs Cats dataset from Kaggle. You can download the archive version of the dataset from here.

The dataset has the following directory structure:

cats_and_dogs_filtered
|__ train
|______ cats: [cat.0.jpg, cat.1.jpg, cat.2.jpg ....]
|______ dogs: [dog.0.jpg, dog.1.jpg, dog.2.jpg ...]
|__ validation
|______ cats: [cat.2000.jpg, cat.2001.jpg, cat.2002.jpg ....]
|______ dogs: [dog.2000.jpg, dog.2001.jpg, dog.2002.jpg ...]

Next, we’ll assign variables with the proper file path for the training and validation set.

If you’d like to know cat and dog images are in the training and validation directory, use the following code:

If you’ve got different output, don’t fret — you’ve probably used a different dataset

Prepping our data

Our images have to be formatted into appropriately pre-processed floating-point tensors before we can feed it to the network.

Visualizing the training images

We will now extract a batch of images from the training generator and then plot five of them with matplotlib.

5 training images

Creating the model

Compiling the model

In this article, we’ll use the ADAM optimizer and binary cross-entropy loss function.

Training our model

We’ll use the fit_generator method of the ImageDataGenerator class to train our network.

Allow this to run — it will take some time. Have patience!

Visualizing the training results with a Plot

Training and Validation Plot

As you can see from the graphs, the training accuracy and validation accuracy are off by a large margin and the model has achieved only around 70% accuracy on the validation set.

Overfitting

In the plots above, the training accuracy increases linearly over time, whereas the validation accuracy stalls around 70% in the training process. Also, there is a difference between the training and validation accuracy is noticeable — a sign of overfitting.

When there are a small number of training examples, the model sometimes learns from noises (unwanted details) from the training examples. This negatively impacts the performance of the model on new examples. This phenomenon is known as overfitting.

Overfitting Electoral Precedence (Source: XKCD)

This means that the model will have a difficult time generalizing on a new dataset.

There are multiple ways to fight overfitting in the training process. In this article, we’ll use data augmentation and add dropout to our model.

Data augmentation

Overfitting generally occurs when there are a small number of training examples. One way to fix this problem is to augment the dataset so that it has a sufficient number of training examples.

The goal is that our model should never see the same picture twice during the training phase. This helps expose it to more aspects of the data which will ultimately increase its recognition accuracy.

Augmenting and visualizing the data

In this article, we’ll apply random horizontal flip, rotate and zoom augmentation to the dataset and see how individual images look like after the transformation.

Creating a validation data generator

In general, we only apply data augmentation to the training examples. In this case, we’ll rescale the validation images and convert them into batches.

Dropout

Another technique to reduce overfitting is to introduce dropout to the network.

Dropout is a form of regularization that randomly knocks out units (i.e neurons) from your neural network so on every iteration you end up working with a smaller neural network.

The intuition here is: You can’t rely on any one feature, so you have to spread out the weights.

When you apply dropout to a layer, it randomly drops out (set to zero) the number of output units during the training process. It takes a fractional number as its input, in the form such as 0.1, 0.3, etc. (this means dropping out 10% or 30% of the output units randomly from the layer it's applied to).

Creating a new network with Dropouts

Visualizing the new model

You will notice that there is significantly less overfitting than before. The accuracy should go up after training the model for more epochs.

A more accurate model

That’s it! We’ve successfully built an Image Classifier to recognize cats from dogs in an image. As with all neural networks, we couldn’t reach 100% accuracy, but we were able to improve the accuracy level of our model by a considerable bit by exposing it to the same number of images, but with augmentations like rotation, flip and zoom.

As always, thanks so much for reading! Please tell me what you think or would like me to write about next in the comments. I’m open to criticism as well!

See you in the next post! 😄

--

--

I write libraries and sometimes blog about them | Top Writer | Creator of Caer, the Vision library for Python