Deep Learning with CIFAR-10
Image Classification using CNN
Neural Networks are the programmable patterns that helps to solve complex problems and bring the best achievable output. Deep Learning as we all know is a step ahead of Machine Learning, and it helps to train the Neural Networks for getting the solution of questions unanswered and or improving the solution!
In this article, we will be implementing a Deep Learning Model using CIFAR-10 dataset. The dataset is commonly used in Deep Learning for testing models of Image Classification. It has 60,000 color images comprising of 10 different classes. The image size is 32×32 and the dataset has 50,000 training images and 10,000 test images. One can find the CIFAR-10 dataset here.
Importing Data
Deep Learning models require machine with high computational power. It is generally recommended to use online GPUs like that of Kaggle or Google Collaboratory for the same. I have implemented the project on Google Collaboratory. For the project we will be using TensorFlow and matplotlib library. Since the dataset is used globally, one can directly import the dataset from keras module of the TensorFlow library.
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import cifar10
Pre-Processing the Data
The first step of any Machine Learning, Deep Learning or Data Science project is to pre-process the data. We will be defining the names of the classes, over which the dataset is distributed. There are 10 different classes of color images of size 32×32. Once we have set the class name. We need to normalize the image so that our model can train faster. The pixel range of a color image is 0–255. We will be dividing each pixel of the image by 255 so the pixel range will be between 0–1. Actually, we will be dividing it by 255.0 as it is a float operation. For the model, we will be using Convolutional Neural Networks (CNN).
# setting class names
class_names=['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck']
x_train=x_train/255.0
x_train.shape
x_test=x_test/255.0
x_test.shape
In the output of shape we see 4 values e.g. (50000,32,32,3). These 4 values are as follows: the first value, i.e.(50,000/10,000) shows the number of images. The second and third value shows the image size, i.e. image height and width. Here the image size is 32×32. The fourth value shows ‘3’, which shows RGB format, since the images we are using are color images.
Building CNN model
A CNN model works in three stages. In the first stage, a convolutional layer extracts the features of the image/data. In the second stage a pooling layer reduces the dimensionality of the image, so small changes do not create a big change on the model. Simply saying, it prevents over-fitting. In the third stage a flattening layer transforms our model in one-dimension and feeds it to the fully connected dense layer. This dense layer then performs prediction of image. A good model has multiple layers of convolutional layers and pooling layers.

While creating a Neural Network model, there are two generally used APIs: Sequential API and Functional API. Sequential API allows us to create a model layer wise and add it to the sequential Class. The drawback of Sequential API is we cannot use it to create a model where we want to use multiple input sources and get outputs at different location. To overcome this drawback, we use Functional API. By using Functional API we can create multiple input and output model. Though, in most of the cases Sequential API is used. We will be using Sequential API for our CNN model.
cifar10_model=tf.keras.models.Sequential()
# First Layer
cifar10_model.add(tf.keras.layers.Conv2D(filters=32,kernel_size=3, padding="same", activation="relu", input_shape=[32,32,3]))
We are using Convolutional Neural Network, so we will be using a convolutional layer. The most common used and the layer we are using is Conv2D. Conv2D means convolution takes place on 2 axis. It extends the convolution to three strata, Red, Green and Blue. The other type of convolutional layer is Conv1D. Conv1D is used generally for "texts", Conv2D is used generally for "images". I think most of the reader will be knowing what is convolution and how to do it, still, this video will help one to get clarity on how convolution works in CNN.
Parameters of the Conv2D layers
The first parameter is "filters". The number. Value of the filters show the number of filters from which the CNN model and the convolutional layer will learn from. From each such filter, the convolutional layer learn something about the image, like hue, boundary, shape/feature. The value of the parameters should be in the power of 2.
The second parameter is "kernel-size". Kernel means a filter which will move through the image and extract features of the part using a dot product. Kernel-size means the dimension (height x width) of that filter. The value of the kernel size if generally an odd number e.g. 3,5,7.. etc. Here we have used kernel-size of 3, which means the filter size is of 3 x 3.
The next parameter is "padding". There are two types of padding, SAME & VALID. In VALID padding, there is no padding of zeros on the boundary of the image. So that when convolution takes place, there is loss of data, as some features can not be convolved. In the SAME padding, there is a layer of zeros padded on all the boundary of image, so there is no loss of data. Moreover, the dimension of the output of the image after convolution is same as the input of the image. Aforementioned is the reason behind the nomenclature of this padding as SAME. Since in the initial layers we can not lose data, we have used SAME padding.

The reason behind using Deep Learning models is to solve complex functionalities. For getting a better output, we need to fit the model in ways too complex, so we need to use functions which can solve the non-linear complexity of the model. This is done by using an activation layer. In any deep learning model, one needs a minimum of one layer with activation function. The work of activation function, is to add non-linearity to the model. If we do not add this layer, the model will be a simple linear regression model and would not achieve the desired results, as it is unable to fit the non-linear part.
There are 4 famous activation functions:
- ) Sigmoid function: The value range is between 0 to 1. The graph is a steep graph, so even a small change can bring a big difference. It is mainly used for binary classification, as demarcation can be easily done as value above or below 0.5.

- ) TanH function: It is abbreviation of Tangent Hyperbolic function. It is a derived function of Sigmoid function. The mathematics behind these activation function is out of the scope of this article, so I would not jump there. The range of the value is between -1 to 1.

- ) ReLu function: It is the abbreviation of Rectified Linear Unit. It is the most famous activation of deep learning. It is famous because it is easier to compute since the mathematical function is easier and simple than other activation functions.

- ) SoftMax function: SoftMax function is more elucidated form of Sigmoid function. It is used for multi-class classification. The function calculates the probabilities of a particular class in a function. Thus the output value range of the function is between 0 to 1. The primary difference between Sigmoid function and SoftMax function is, Sigmoid function can be used for binary classification while the SoftMax function can be used for Multi-Class Classification also.
Pooling layer
#MaxPoolingLayer
cifar10_model.add(tf.keras.layers.MaxPool2D(pool_size=2,strides=2, padding='valid'))
Pooling layer is used to reduce the size of the image along with keeping the important parameters in role. Thus it helps to reduce the computation in the model. While performing Convolution, the convolutional layer keeps information about the exact position of feature. And thus not-so-important features are also located perfectly. As a result of which we get a problem that even a small change in pixel or feature may lead to a big change in the output of the model. By Max Pooling we narrow down the scope and of all the features, the most important features are only taken into account. Thus the aforementioned problem is solved. Pooling is done in two ways Average Pooling or Max Pooling. Max Pooling is generally used.

In Average Pooling, the average value from the pool size is taken. In Max Pooling, the max value from the pool size is taken. The concept will be cleared from the images above and below.

Pool Size means the size of filter of which the max value will be taken. The pool size here 2 means, a pool of 2×2 will be used and in that 2×2 pool, the average/max value will become the output. The pool will traverse across the image. It will move according to the value of strides.
Strides means how much jump the pool size will make. If the stride is 1, the 2×2 pool will move in right direction gradually from one column to other column. I have used the stride 2, which mean the pool size will shift two columns at a time. The images I have used ahead to explain Max Pooling and Average pooling have a pool size of 2 and strides = 2.
In Pooling we use the padding "Valid", because we are ready to loose some information. As the function of Pooling is to reduce the spatial dimension of the image and reduce computation in the model.
Last Layers
# Flattening Layer
cifar10_model.add(tf.keras.layers.Flatten())
Flattening Layer is added after the stack of convolutional layers and pooling layers. Flattening layer converts the 3d image vector into 1d. Because after the stack of layers, mentioned before, a final fully connected Dense layer is added. Now the Dense layer requires the data to be passed in 1dimension, so flattening layer is quintessential.

After flattening layer, there is a Dense layer. Dense layer is a fully connected layer and feeds all output from the previous functioning to all the neurons. Dense layer has a weight W, a bias of B and the activation which is passed to each element. Speaking in a lucid way, it connects all the dots. This layer uses all the features extracted before and does the work of training the model. The units mentioned shows the number of neurons the model is going to use.
# Droput Layer
cifar10_model.add(Dropout(0.2))
# Adding the first fully connected layer
cifar10_model.add(tf.keras.layers.Dense(units= 128,activation='relu'))
Now to prevent overfitting, a dropout layer is added. During training of data, some neurons are disabled randomly. The value passed to neurons mean what fraction of neuron one wants to drop during an iteration. Thus after training, the neurons are not affected highly by the weights of other neurons. As a result of which the the model can generalize better.

Output Layer
In the output, the layer uses the number of units as per the number of classes in the dataset. Here we are using 10, as there are 10 units. In the output we use SOFTMAX activation as it gives the probabilities of each class.
While compiling the model, we need to take into account the loss function. There are two loss functions used generally, Sparse Categorical Cross-Entropy(scce) and Categorical Cross-Entropy(cce). Sparse Categorical Cross-Entropy(scce) is used when the classes are mutually exclusive, the classes are totally distinct then this is used. Categorical Cross-Entropy is used when a label or part can have multiple classes. In out scenario the classes are totally distinctive so we are using Sparse Categorical Cross-Entropy.
We will be using the generally used Adam Optimizer. Adam is an abbreviation for "Adaptive Learning rate Method". This optimizer uses the initial of the gradient to adapt to the learning rate. Adam is now used instead of the stochastic gradient descent, which is used in ML, because it can update the weights after each iteration.
The final output after playing a bit with epochs was:
Using the model I was able to get an accuracy of 78%. So, in this article we go through working of Deep Learning project using Google Collaboratory. We understand about the parameters used in Convolutional Layer and Pooling layer of Convolutional Neural Network. After extracting features in a CNN, we need a dense layer and a dropout to implement this features in recognizing the images. Finally we see a bit about the loss functions and Adam optimizer.
You can find the complete code in my git repository: https://github.com/aaryaab/CIFAR-10-Image-Classification.
Feel free to connect with me at : https://www.linkedin.com/in/aarya-brahmane-4b6986128/
References: One can find and make some interesting graphs at : https://www.mathsisfun.com/data/function-grapher.php#functions
Graphical Images are made by me on Power point.
Happy Deep Learning!
Peace!!