
PyTorch is a Deep Learning framework developed by Facebook’s AI Research lab (FAIR). Thanks to its C++ and CUDA backend, the N-dimensional arrays called Tensors can be used in GPU as well.
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning which takes an input image and assigns importance(weights and biases) to various features to help in distinguishing images.

A Neural Network is broadly classified into 3 layers:
- Input Layer
- Hidden Layer (can consist of one or more such layers)
- Output Layer
The Hidden layer can be further divided mainly into 2 layers-
- Convolution Layer: this extracts features from the given input images.
- Full Connected Dense Layers: it assigns importance to the features from the convolution layer to generate output.
A convolutional neural network process generally involves two steps-
- Forward Propagation: the weights and biases are randomly initialized and this generates the output at the end.
- Back Propagation: the weights and biases are randomly initialized in the beginning and depending on the error, the values are updated. Forward propagation happens with these updated values again and again for newer outputs to minimize the error.+

Activation functions are mathematical equations assigned to each neuron in the neural network and determine whether it should be activated or not depending on its importance(weight) in the image.
There are two types of activation functions:
- Linear Activation Functions: It takes the inputs, multiplied by the weights for each neuron, and creates an output signal proportional to the input. However, the problem with Linear Activation Functions is that it gives the output as a constant thus making it impossible for backpropagation to be used as there is no relation to the input. Further, it collapses all layers into a single layer thus turning the neural network into a single layer network.
- Non-Linear Activation Functions: These create complex mappings between the different layers which help in better learning modelling of data. Few of the non-linear activation functions are – sigmoid, tanh, softmax, ReLU (Rectified Linear Units), Leaky ReLU etc.
Enough theory, let’s get started.
Github
If you are familiar with Github, check out my repository for the code and dataset.
Shubhankar07/Zoo-Classification-using-Pytorch-and-Convolutional-Neural-Networks

Dataset
To train a model, the first order of business is to find a dataset. For the project, the dataset used is a modified version of the original dataset from Kaggle:
Original dataset: https://www.kaggle.com/c/swdl2020/overview
Customized dataset: https://drive.google.com/file/d/1KRqqs2hi2KagfGAcTdWQPQUZO6RX2VE6/view?usp=sharing
There are 3 classes in the dataset corresponding to the three animals: Tiger, Hyena and Cheetah.

The dataset has already been divided into 2 folders namely training and validation. The training folder contains 2700 images – 900 each in separate folders and validation folder contains 300 images – 100 each in separate folders corresponding to each animal.
Importing the Libraries
If you have not yet installed the Pytorch library you use the following commands:
If you are running on Anaconda, then once you are on the virtual environment, run the command-
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
If you want to locally install it using pip, the following command will open a wheel file for download-
pip install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f
Understanding the Dataset
Once we have imported the required libraries, let’s import the dataset and understand it.
As we can see, there are a total of 3 classes, each representing a particular animal. _datadir is the path where the dataset is stored. If working in an online environment, then the dataset needs to be either uploaded onto the environment locally or saved onto the drive as in case of Google Colab. If you are working on a local environment, then insert your local dataset path here.
Now let’s convert the dataset into an N-Dimensional Tensor.
Now the variable dataset is a tensor containing all the images for training. As stated, the size of the dataset is 2700.
We shall now use the validation folder as the testing folder. The size of the test dataset is 300.
Now let’s check the dataset.
Each image is of the shape 3x400x400 indication 400×400 as the image dimensions and 3 denoted the colour channels RGB.
Let us now display a few of the images in the dataset.
Preparing the Dataset for Training
We shall now split the training set into two for training and validation.
For this, we will use the random_split() function.
Let’s keep the batch size at 32 and load the data.
Let’s look at the batch.
Now that we have finished with the preparations, let’s move onto the model.
Building the Model
Over here, we use cross_entropy or log loss as the loss function. In the case of multiclass classification, the cross_entropy formula is:

Now let’s check if a GPU is available and use it if available.
GPU has been selected now. If ‘cuda’ is not available, then do check your settings if you working locally. If you are working on Kaggle, then ensure no other GPU sessions are active and that you haven’t used up the free monthly quota of 30 hours.
We shall now define the layers of the convolution network.
We have defined 3 hidden layers. For the activation function, we are using ReLU (Rectified Linear Units).
Now that we have built a basic model, let’s try to fit it with various learning rates and epochs and check the accuracy.
Let us now analyze the model.
And let us finally evaluate the model.
As you can see, the accuracy is very low at just around 50.9%. This is because we haven’t added any convolution layers to the model yet.

Let us now build the convolution layers using Resnet-18.
Resnet or Residual Network is a convolutional neural network having the powerful representational ability that makes it possible to train up to hundreds or even thousands of layers and still achieves compelling performance. We are going to use Resnet-18 which indicates the network is 18 layers deep.
Let us now define the convolution layers.
We shall now use a pre-trained Resnet-18 model.
Let us now see the output shape.
Check if GPU is available and assign it as the device.
Confirm that the device type is GPU.
Load the data for training and testing.
Assign the model.
Let us now check the initial loss and accuracy.
The accuracy initially is at 35%. Let us now set the hyperparameters and starting training the model.
Opt_func stands for Optimizer Function. They tie together the loss function and model parameters by updating the model in response to the output of the loss function. The function that we are going to use is Adam optimizer. Adam is an optimization algorithm that can be used instead of the classical stochastic gradient descent procedure to update network weights iterative based in training data. It realizes the benefits of both AdaGrad(Adaptive Gradient Algorithm) and RMSProp(Root Mean Square Propagation).
Let us now analyze the model.
And now, let us finally evaluate the model.
Analysis
As we can see, the application of Convolution layers helped increase the accuracy to 89.5%. The accuracy can further be increased by having a larger training dataset and by further tuning the hyperparameters.
Conclusion
We have successfully built a Convolutional Neural Network model to classify zoo animals. There are quite a lot of similar classification datasets that one can get acquainted with to get familiarized with Convolution Neural Network, PyTorch and other concepts.
