Building a Simple Neural Network from Scratch

Have you ever wondered how a neural network works? How it learns and how it can scale up to the huge data that we feed into it?

Akarsh Saxena
Towards Data Science

--

(Source)

In this article, we will look into the working of a simple neural network having only one neuron and we’ll see how it performs on our “Cat v/s Non Cat dataset”.

By the end of this article, you will be able to-

  1. Code your own Neural Network from scratch
  2. Understand how neural networks work
  3. How to transform input data to feed into a neural network

The Goal

We’ll be coding our neural network and then using the trained network to determine whether an image contains a cat or not. This type of problem is known as a “Binary Classification Problem”. It consists of classifying the input into two classes, in our case, ‘Cat’ or ‘Not cat’.

The Foundation of a Neural Network

The Linear Regression Equation

A single neuron in the neural network works as a straight line which has the following equation:

This is the fundamental equation around which the whole concept of neural networks is based on. Let us break down this equation:

y: Dependent variable (Output of the neural network)

m: Slope of the line

x: Independent variable (Input features)

b: y-intercept

Linear equation plotted on a graph (source)

In terms of neural networks, we specify the slope as Weights, intercept as Bias and the output(y) as z. So the equation becomes:

Here we have only one feature which we are giving to the model. To input multiple features, we’ll have to scale up the equation.

Scaling up to Multiple Features

The above equation can be scaled to ’n’ number of features which can be written as:

Here we have ’n’ input features fed to our model. Corresponding to each input feature, we have a weight which specifies how important the feature is to our model to predict the output. The bias term helps in shifting our line on the axis to better fit the training data or else the line will always go through the origin (0, 0).

Doing It All At Once

We can make use of matrices to multiply all the weights with the inputs and adding biases to them. This can be done as follows:

Here, each row represents a single training example (image, in our case) and each column represents an array of pixels.

In python, we will use Vectorization to implement the above concept.

Note: In the above equation, we have used “X.w+b” because our input matrix is of the shape (mXn) where ‘m’ is the number of samples and ’n’ is the number of features.

The goal of training the neural network is to update the weights and biases to get as accurate predictions as we can.

A Neuron

(Source)

A neuron is a single unit in the neural network. It mimics the neuron in our brain having ‘Dendrites’ as inputs, ‘Nucleus’ as body and ‘Axon’ as output. Each neuron takes some input, processes it and gives an output based on an activation function.

If you are unable to grasp any of these concepts, bear with me, it’ll all make sense once we’ll start to put things together.

Coding Our Neural Network

From the creation phase to getting the predictions, the whole process is defined in the following parts:

  1. Preparing the input to feed in our network
  2. Initializing the weights and biases
  3. Forward Propagation
  4. Calculating Loss
  5. Backward Propagation
  6. Updating Weights and Biases
  7. Repeating the above process for multiple times (epochs)
  8. Getting the predictions
(Source)

Preparing the Input

The input that we have is the collection of images of ‘Cats’ and ‘Not Cats’. Each input image is a coloured image of 64x64 px dimension. We have a total of 209 images in our train dataset and 50 images in our test dataset. To feed these images into our neural network, these must be reshaped into a vector of pixels. So each image will be arranged in a row-wise order into a 1-dimensional vector.

(Source)

Initially, the shape of our inputs was (209, 64, 64, 3) but now after conversion, it will become (209, 64x64x3) i.e. (209, 12288). Now this ‘12288’ is the number of inputs to our neural network and ‘209’ is the number of training examples.

Our images are in 8-bit so each pixel in our image is having a value in the range [0, 255] i.e. a range of total 256 values (2⁸=256). Therefore, we have to normalize our images by dividing each pixel by the maximum value i.e. 255.

Neural networks are very sensitive to the input scale. We don’t want our inputs to vary too much or else larger inputs might dominate the smaller ones. So it’s always a good practice to normalize our inputs if they are in a large range.

Initializing the Weights and Biases

We will have to initialize weights and biases to some small value to be able to start the process of training.

Here, we have multiplied the weights with 0.01 so that they will not explode (get very large) in the whole training process.

Note: In the code, ‘b’ is only a ‘float’ number because of the broadcasting in python. It will automatically convert this ‘float’ number to the required vector shape.

So now that we got our weights and biases initialized, let’s move forward to the next step.

Forward Propagation

Here, we will calculate the output ‘z’ using the equation that we established above and then using an activation function on the output.

We will use the sigmoid activation function in case of binary classification because sigmoid function gives output by transforming the input to [0, 1] interval. So, we can easily find the target class by making values greater than 0.5 to 1 and less than 0.5 to 0.

The formula for sigmoid is:

where,

Let us look at the code to implement these two steps:

So now that we got the output from the neuron, we can calculate loss and see how good/bad our model is performing.

Calculating Loss

In binary classification, the loss function used is Binary Cross Entropy / Log Loss. It is given by the formula:

where,

m: Total number of samples in the dataset

yᵢ: True label for iᵗʰ sample

aᵢ: Predicted value for iᵗʰ sample

This can be implemented in python as:

This loss will tell us how far are we from predicting the correct output. If the loss is 0, then we have a perfect model. But practically, if the loss is 0, then our model might be overfitting to the data.

Backward Propagation

This is the part where all the magic happens. In every iteration, based on the model output and the expected output, we calculate the gradients. The gradient is how much weights and biases we need to change in order to reduce the loss.

To calculate the gradient, we use the following formulas:

In python, the above equations can be implemented as:

Now, the ‘dw’ and ‘db’ contains the gradients that we need to adjust in our weights and biases respectively.

Updating the Weights and Biases

We now need to adjust the weights and biases with the gradient that we just calculated. For this, the following equations are used:

where ‘α’ (alpha) is the learning rate which defines how big/small our update should be.

The code for updating the parameters is:

Now we just have to repeat these steps for multiple times to train the neural network.

Training the Neural Network

In order to train the neural network, we must run the above steps for some epochs (number of times to repeat these process).

Now that our neural network is trained, we can predict the output for new input.

Getting the Predictions

To get predictions from our neural network, we have to convert the output ‘a’ of the neural network such that the output value less than 0.5 becomes 0, otherwise 1.

Although we only have one neuron, we are still getting 96% accuracy on the training data and 76% accuracy on the test data which is not bad.

Congratulations! We have just made our first neural network from scratch. In the next article, I will tell you how to develop a shallow neural network which will have one hidden layer containing multiple neurons.

You can find the complete code on Github:

So, this was the whole process of creating and training a neural network from scratch. I hope you understand everything. Still, if you haven’t understood anything, drop a comment here and I’ll try to resolve your query.

Follow me on Github and LinkedIn for more.

--

--