Introduction to Convolutional Neural Network (CNN) using Tensorflow

Govinda Dumane
Towards Data Science
7 min readMar 2, 2020

--

Portions of this page are reproduced from work created and shared by Google and used according to terms described in the Creative Commons 4.0 Attribution License. source

Machine Learning is being used in almost every type of industry. It is helping people to minimize their work load as machines are capable of executing most of the human tasks with high performance. Machines can do Predictive Analysis such as classification & regression(predicting numerical values) and tasks like driving car which require some kind of intelligence.

Machine Learning is part of Artificial Intelligence, in which we provide data to the machine so that it can learn pattern from the data and it will be able to predict solution for similar future problems. Neural Network(NN) is inspired by neural network of the human brain. Computer Vision is a field of Artificial Intelligence which focuses on problems related to images. CNN combined with Computer Vision is capable of performing complex operations ranging from classifying images to solving scientific problems of astronomy and building self-driving cars. .

Geoffrey Hinton Said, “I think people need to understand that deep learning is making a lot of things, behind the scenes, much better.”

So question is, How these machines learn to process image?

It uses Convolution Neural Network for doing this task effectively. Let’s understand what is Convolutional Neural Network, aka CNN.

As we know that image is 2-dimensional array of pixels. Any image can be classified based on it’s features. Scikit-learn algorithms like SVM, decision-tree, Random-Forest, etc which are good at solving classification problem, fail to extract appropriate features from the image.

That’s where Convolutional Neural Network comes into the picture. CNN is combination of Convolutional Layers and Neural Network.

Basically any Neural Network which is used for image processing, consist of following layers -

  • Input layer, Convolutional Layer, Pooling Layer, Dense Layer.

Convolution is nothing but a filter which is applied on image to extract feature from it. We will use such different convolutions to extract different features like edges, high-lighted patterns from the image.

How Convolution work on the image. source

What this convolution does is, it creates a filter of some size (default size is 3X3 ). After creating filter, it starts performing element-wise multiplication starting from top left corner of image. Element-wise multiplication means multiplying elements with same index. These computed values are summed up to obtain a pixel value and it is stored in the new matrix. We will use this newly generated matrix for further processing.

Conv2D(32, 3, activation='relu') 
# 32 filters or convolutions of size 3 X 3, with relu as activation function.

The size of matrix decreases as we keep on applying filters on the obtained matrix.

Size of new matrix = (Size of old matrix — filter size) +1

When we say that size of convolutional layer is 32. It means that 32 randomly generated filters will be applied to the image, which outputs 32 feature matrices for that image. These feature matrices are passed to next layer as input.

After applying convolutions, there is another concept as pooling. Pooling is used to reduce the size of image. There are two types of pooling :

  1. Max Pooling: It is nothing but selecting maximum value from the matrix of specified size(default size is 2 X 2). This method is helpful to extract features with high importance or which are high-lighted in the image.

High-lighted feature is part of image having high pixel values.

Max Pooling with size 2 X 2. source
MaxPooling2D() # method for Max-pooling layer 
# default size of matrix is 2 X 2

2. Average Pooling: Unlike Max-pooling, Average pooling takes average of all the pixel values of the matrix( default size is 2 X 2) of pooling layer.

Averaging Pooling. source

In above example, image size is 4 X 4 and pooling size is 2 X 2. Starting from top left pixels. It will calculate average of 2 X 2 chunk matrices. For 1st 2 X 2 chunk, output value is calculated as (1+3+2+9)/4 = 15 / 4 =3.75. In similar way, all other values will be calculated.

AveragePooling2D()  # method for Average pooling layer
# default size of matrix is 2 X 2

In most of the cases, max pooling is used because its performance is much better than average pooling.

In any Neural Network, first layer will be input layer and last will be the output layer. Input layer contains all the inputs, here images is inputs. These images are given as input to the first convolutional layer. The output of 1st layer will be given as input to the 2nd layer, so on & so forth. This process will continue till the last layer.

While defining Neural Network, first convolutional layer requires the shape of image that is passed to it as input. After passing the image, through all convolutional layers and pooling layers, output will be passed to dense layer.

We can not pass output of convolutional layer directly to the dense layer because output of convolutional layer is in multi-dimensional shape and dense layer requires input in single-dimensional shape i.e. 1-D array.

So we will use Flatten() method in between convolutional and dense layer. Flatten() method converts multi-dimensional matrix to single dimensional matrix. In Neural Network, non-linear function is used as activation function.

Graph for linear function. source

Linear function is the expression having highest exponent as 1. Graph of linear function is straight line.

Example: Y= 2X +3.

Graph for non-linear function. source

Whereas Non-linear function is the expression having highest exponent greater than 1. The graph of non-linear function is not straight line, rather it is curve.

Example: Y= X²

Dense Layer is simple layer of neurons in which each neuron receives input from all the neurons of previous layer, thus called as dense. Dense Layer is used to classify image based on output from convolutional layers.

Working of single neuron. A layer contains multiple number of such neurons. source

Each Layer in the Neural Network contains neurons, which compute the weighted average of its input and this weighted average is passed through a non-linear function, called as an “activation function”. Result of this activation function is treated as output of that neuron. In similar way, the process is carried out for all neurons of all layers.

The output of the last layer will be considered as output for that image.

#sample code for creating Convolutional Neural Networkmodel = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D())
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D())
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

Here, output layer has 10 neurons with softmax activation function. Softmax activation function is used when we have 2 or more than 2 classes. If we have total 10 classes, then the number of neurons in the output layer will be 10 . Each neuron represents one class.

All 10 neurons will return probabilities of the input image for the respective class. Class with highest probability will be considered as output for that image.

In same way, we will pass all the images as to convolutional layer and then to the Neural Network, which will produce corresponding outputs for those images.

model.summary()

summary() method will display the architecture of the model.

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 30, 30, 32) 896
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 13, 13, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 4, 4, 64) 36928
_________________________________________________________________
flatten (Flatten) (None, 1024) 0
_________________________________________________________________
dense (Dense) (None, 64) 65600
_________________________________________________________________
dense_1 (Dense) (None, 10) 650
=================================================================
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
_________________________________________________________________

Parameters (params) are the weights and biases that will be used for computation in all neurons of the CNN.

When we train any model on some number of images, it will determine some specific values for all parameters(i.e. weights and biases), which are used to process the image and predict the output for that image.

Convolutional Neural Networks are primarily used for binary and multi-class, multi-label classification. There are some pre-trained models like Inception, VGG16, VGG19, mobilenet, etc which are created by some researchers after training them on millions of images to classify images in multiple categories. These models have learned patterns for classifying images. If you are planning to build an image classifier, then you can also use one of these models as base layer and add some dense layers of your choice at the end. The number of dense layers will vary depending on your requirement and number of output classes.

In this article, we did a walkthrough of the basics of Convolutional Neural Network which is very important for building image-classifier model. In next article, let’s see how to build an image classifier using pre-trained model.

If you want to check more about Convolutional Neural Network, you can refer to the official website of Tensorflow, here.

Thanks for reading. 😊

--

--

Software Developer Engineer in Test. I work on Python, JS, Automation testing, Framework Development