Classifying fashion apparel- Getting started with Computer Vision
Get started with Computer Vision by creating a model to classify images of fashion apparel.
In this guide, you will be training a neural network model to classify images of clothing like shirts, coats, sneakers etc.
Whew! That sounds a lot for a beginner tutorial, I mean we are just getting started right?
Not to worry! Don’t get overwhelmed, it’s okay if you don’t understand all the details. You would learn all the details as you go deeper into the article, trust me :).
If you are totally new to machine learning, I would suggest you check out my beginner tutorial.
Here are the completed Colab Notebook and the GitHub repo.
With that said, let’s get started!
The Data
We will be using the Fashion-MNIST dataset. It is a data set composed of 60,000 square (28x28 pixel) grayscale images of 10 types of clothing.
Each of the apparels are assigned a particular label:
0- T-shirt/top1- Trouser2- Pullover3- Dress4- Coat5- Sandal6- Shirt7- Sneaker8- Bag9- Ankle boot
Let’s get to the code
We will use TensorFlow and TensorFlow Keras for building our model.
TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.
Keras is TensorFlow’s high-level API for building and training deep learning models.
You can read more about them. Getting a basic idea about the tools are enough for now as you would learn more about TensorFlow and Keras as you go along.
Less talking, more code!!!
Importing the libraries
We will use the numpy and matplotlib also as helper libraries.
Importing the data
The Fashion-MNIST data is readily available in Keras datasets.
This will load the fashion_mnist
data into 4 NumPy arrays:
- The
train_images
andtrain_labels
arrays are the training set — the data the model uses to learn. - The model is tested against the test set, the
test_images
, andtest_labels
arrays.
Exploring the data
The following code shows that there are 6000 training images and 1000 test images of 28x28 pixels. We will train the model on the training images and test the performance of the model by performing predictions on the test images. The images have been labelled correspondingly in the train_labels
and test_labels
.
Train Images Shape: (60000, 28, 28)
Train Labels Shape: (60000,)
Test Images Shape: (10000, 28, 28)
Test Labels Shape: (10000,)
Now let’s take a look at the data we just loaded.
Since the pixel values lie between 0–255, we convert it into values between 0 and 1. I.e we just divide the pixel values by 255.0.
Creating the model
Building a neural network requires configuring the layers of the model and then compiling the model.
Layers are the basic building blocks of a neural network. They extract features or representations from the data that is fed into them. After training, these features would help us solve the problem at hand — classifying fashion apparels.
Here we will chain together some simple layers to create our model.
The first layer of the network, tf.keras.layers.Flatten, transforms the image which is a 2D array (of 28x28 pixels) to a 1D array (of size 28*28 = 784). It basically takes the input image and lines up each row of pixels back to back. This layer is used only for transforming the data.
Once the input images have been transformed by the Flatten layer, the network then has two tf.keras.layers.Dense layers.
These are well, densely connected or fully connected layers.
The first Dense layer has 128 neurons and the second Dense layer, which is the last layer of our network, has 10 neurons. The last layer of the network is our output layer which would provide the output of the model. Each of the 10 nodes would contain the probability score that indicates the current image belongs to one of the 10 classes. (Remember there are 10 classifications for the apparels in our data)
Compile the model
We are almost ready to train our model! Before that, we have to configure a few more settings.
Loss function: This measures how accurate the model is during training. You want to minimize this function to “steer” the model in the right direction. I.e the model tries to minimise the loss function with each step of the training to improve the model.
Optimizer: Optimizers update the weight parameters to minimize the loss function.
Metrics: A metric is a function that is used to judge the performance of your model. Metric functions are similar to loss functions, except that the results from evaluating a metric are not used when training the model. The following model uses accuracy, the fraction of the images that are correctly classified.
You don’t need to know all the details about the loss function sparse_categorical_crossentropy or adam optimizer for now. You can check out the docs if you need to learn more. For now, having a grasp of what loss function and optimizers are would be enough.
Train the model
For training our model, we simply feed the model our training data and labels contained in train_images
and train_labels
respectively.
We call the model.fit
method to “fit” the model to the training data.
Epoch 1/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3768 - accuracy: 0.8636
Epoch 2/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3394 - accuracy: 0.8762
Epoch 3/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3145 - accuracy: 0.8851
Epoch 4/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2965 - accuracy: 0.8902
Epoch 5/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2818 - accuracy: 0.8957
Epoch 6/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2698 - accuracy: 0.9002
Epoch 7/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2582 - accuracy: 0.9043
Epoch 8/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2495 - accuracy: 0.9074
Epoch 9/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2409 - accuracy: 0.9095
Epoch 10/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2324 - accuracy: 0.9137
We can see the loss and accuracy metrics displayed as the model is being trained. As the model trains, the loss decreases and accuracy increases. Kudos! Your model is learning!
Evaluate the model
The model has about 90% (0.90) accuracy on the training data. You may have values somewhere around 90% (not to worry if it is slightly different as it is prone to some randomness)
But that is not enough! We still haven’t tested the model. We will now test our model on our test data, which the model has never seen before! Let’s see how it performs.
313/313 - 1s - loss: 0.3366 - accuracy: 0.8838
Test accuracy: 0.8838000297546387
It turns out that the accuracy on the test dataset is a little less than the train dataset. This could mean that our model is over-fitting on our training data. We will not worry about that now. In future articles, we will discuss what causes over-fitting and how we can prevent it.
Making Predictions
Finally! We can now use our model to make predictions on images. Here we have a function to plot 100 random test images and their predicted labels. If a prediction result is different from the label provided in the test_labels
dataset, we will highlight it in red color.
Wow! You have done it! You have successfully created a model which can look at images of fashion apparels and classify them with a good certainty! When you think about it, all it took was a few lines of code.
We see a few errors, but for our first model, things are looking pretty good!
The completed Colab Notebook is available here and the code is also available in GitHub.
With this new knowledge of TensorFlow, Keras and machine learning in general, you would be able to create your own models for a wide variety of datasets. Moreover, the tools and techniques you learned here are the foundations of complex models used in practice.
In the coming tutorials, we will take a look at Convolutional Neural Networks- a type of neural network used widely for computer vision applications. We will see that we can improve the accuracy of our model further using CNNs.
Happy Coding!