Classifying fashion apparel- Getting started with Computer Vision

Get started with Computer Vision by creating a model to classify images of fashion apparel.

Published in

Towards Data Science

6 min readMay 26, 2020

In this guide, you will be training a neural network model to classify images of clothing like shirts, coats, sneakers etc.

Whew! That sounds a lot for a beginner tutorial, I mean we are just getting started right?

Not to worry! Don’t get overwhelmed, it’s okay if you don’t understand all the details. You would learn all the details as you go deeper into the article, trust me :).

If you are totally new to machine learning, I would suggest you check out my beginner tutorial.

Here are the completed Colab Notebook and the GitHub repo.

With that said, let’s get started!

The Data

We will be using the Fashion-MNIST dataset. It is a data set composed of 60,000 square (28x28 pixel) grayscale images of 10 types of clothing.

Each of the apparels are assigned a particular label:

0- T-shirt/top1- Trouser2- Pullover3- Dress4- Coat5- Sandal6- Shirt7- Sneaker8- Bag9- Ankle boot

The fashion MNIST dataset | Source: ZALANDO RESEARCH

Let’s get to the code

We will use TensorFlow and TensorFlow Keras for building our model.

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

Keras is TensorFlow’s high-level API for building and training deep learning models.

You can read more about them. Getting a basic idea about the tools are enough for now as you would learn more about TensorFlow and Keras as you go along.

Less talking, more code!!!

Importing the libraries

We will use the numpy and matplotlib also as helper libraries.

We start by importing all the necessary libraries

Importing the data

The Fashion-MNIST data is readily available in Keras datasets.

We import the Fashion MNIST data from Keras datasets

This will load the fashion_mnist data into 4 NumPy arrays:

The train_images and train_labels arrays are the training set — the data the model uses to learn.
The model is tested against the test set, the test_images, and test_labels arrays.

Exploring the data

The following code shows that there are 6000 training images and 1000 test images of 28x28 pixels. We will train the model on the training images and test the performance of the model by performing predictions on the test images. The images have been labelled correspondingly in the train_labels and test_labels.

We find out about the data we loaded

Train Images Shape: (60000, 28, 28)
Train Labels Shape: (60000,)
Test Images Shape: (10000, 28, 28)
Test Labels Shape: (10000,)

Now let’s take a look at the data we just loaded.

Let’s take a look at the loaded images!

The loaded images from the Fashion MNIST dataset

Since the pixel values lie between 0–255, we convert it into values between 0 and 1. I.e we just divide the pixel values by 255.0.

Normalising the images

Creating the model

Building a neural network requires configuring the layers of the model and then compiling the model.

Layers are the basic building blocks of a neural network. They extract features or representations from the data that is fed into them. After training, these features would help us solve the problem at hand — classifying fashion apparels.

Here we will chain together some simple layers to create our model.

We build our model by chaining layers

The first layer of the network, tf.keras.layers.Flatten, transforms the image which is a 2D array (of 28x28 pixels) to a 1D array (of size 28*28 = 784). It basically takes the input image and lines up each row of pixels back to back. This layer is used only for transforming the data.

Once the input images have been transformed by the Flatten layer, the network then has two tf.keras.layers.Dense layers.

These are well, densely connected or fully connected layers.

The first Dense layer has 128 neurons and the second Dense layer, which is the last layer of our network, has 10 neurons. The last layer of the network is our output layer which would provide the output of the model. Each of the 10 nodes would contain the probability score that indicates the current image belongs to one of the 10 classes. (Remember there are 10 classifications for the apparels in our data)

Compile the model

We are almost ready to train our model! Before that, we have to configure a few more settings.

Loss function: This measures how accurate the model is during training. You want to minimize this function to “steer” the model in the right direction. I.e the model tries to minimise the loss function with each step of the training to improve the model.

Optimizer: Optimizers update the weight parameters to minimize the loss function.

Metrics: A metric is a function that is used to judge the performance of your model. Metric functions are similar to loss functions, except that the results from evaluating a metric are not used when training the model. The following model uses accuracy, the fraction of the images that are correctly classified.

We compile the model

You don’t need to know all the details about the loss function sparse_categorical_crossentropy or adam optimizer for now. You can check out the docs if you need to learn more. For now, having a grasp of what loss function and optimizers are would be enough.

Train the model

For training our model, we simply feed the model our training data and labels contained in train_images and train_labels respectively.

We call the model.fit method to “fit” the model to the training data.

Finally we train our model

Epoch 1/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3768 - accuracy: 0.8636
Epoch 2/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3394 - accuracy: 0.8762
Epoch 3/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3145 - accuracy: 0.8851
Epoch 4/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2965 - accuracy: 0.8902
Epoch 5/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2818 - accuracy: 0.8957
Epoch 6/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2698 - accuracy: 0.9002
Epoch 7/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2582 - accuracy: 0.9043
Epoch 8/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2495 - accuracy: 0.9074
Epoch 9/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2409 - accuracy: 0.9095
Epoch 10/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2324 - accuracy: 0.9137

We can see the loss and accuracy metrics displayed as the model is being trained. As the model trains, the loss decreases and accuracy increases. Kudos! Your model is learning!

Evaluate the model

The model has about 90% (0.90) accuracy on the training data. You may have values somewhere around 90% (not to worry if it is slightly different as it is prone to some randomness)

But that is not enough! We still haven’t tested the model. We will now test our model on our test data, which the model has never seen before! Let’s see how it performs.

Let’s see how our model performs on the test data!

313/313 - 1s - loss: 0.3366 - accuracy: 0.8838

Test accuracy: 0.8838000297546387

It turns out that the accuracy on the test dataset is a little less than the train dataset. This could mean that our model is over-fitting on our training data. We will not worry about that now. In future articles, we will discuss what causes over-fitting and how we can prevent it.

Making Predictions

Finally! We can now use our model to make predictions on images. Here we have a function to plot 100 random test images and their predicted labels. If a prediction result is different from the label provided in the test_labels dataset, we will highlight it in red color.

We build a function to see our model’s predictions

We got a few errors, but for such a simple model, it did quite well!

Wow! You have done it! You have successfully created a model which can look at images of fashion apparels and classify them with a good certainty! When you think about it, all it took was a few lines of code.

We see a few errors, but for our first model, things are looking pretty good!

The completed Colab Notebook is available here and the code is also available in GitHub.

With this new knowledge of TensorFlow, Keras and machine learning in general, you would be able to create your own models for a wide variety of datasets. Moreover, the tools and techniques you learned here are the foundations of complex models used in practice.

In the coming tutorials, we will take a look at Convolutional Neural Networks- a type of neural network used widely for computer vision applications. We will see that we can improve the accuracy of our model further using CNNs.

Happy Coding!