Build Neural Network From Scratch — Part 2

Jose Fumo
Towards Data Science
5 min readAug 24, 2017

--

A Gentle Introduction To Neural Networks Series (GINNS) — Part 2

Introduction

In this post, we are going to build a Perceptron for And Logic Gate, this model we are going to build from scratch using python and numpy. Here I’m assuming that you read A Gentle Introduction To Neural Networks Series — Part1 and that you are already familiar with basic concepts of neural networks. If you are new to this topic I strongly advise you to start from Part 1. Are you still here? Great, this post is going to be one of my shortest post I have written so far, I want to keep things simple while you build a good intuition and understanding of neural networks, so let’s do this, but first, let me explain why build a super dummy neural network from scratchWhy implement a neural network from scratch?

But why implement a Neural Network from scratch at all? I believe what you really want is to create powerful neural networks that can predict if the person in your picture is George Lucas or Luke Skywalker? No!, oh ok, I don’t blame you.

Now, jokes aside. Even if you plan on using Neural Network libraries like Tensorflow, Keras or another one in the future, implementing a network from scratch at least once is an extremely valuable exercise. It helps you gain an understanding of how neural networks work, and that is essential for designing effective models. so, I hope I convinced you to stay here.

Single-layer Neural Network (Perceptron)

The Perceptron algorithm is the simplest type of artificial neural network, it is inspired by the information processing of a single neural cell called a neuron.

Perceptron

In a Perceptron, to compute the output, we will multiply input with respective weights and compare with a threshold value, a generalization to this is to say that, the weighted sum of the inputs is passed through a step/activation function, as shown below:

Weighted sum passed to an activation function.

Activation Function

The activation function of a node defines the output of that node given an input or set of inputs. A standard computer chip circuit can be seen as a digital network of activation functions that can be “ON” (1) or “OFF” (0), depending on input. For our example, we are going to use the binary step function as shown below:

step function

Here is the Code!

Perceptron
  1. load input (x) and output (y)
  2. Initialize weights and other parameters: learning rate is a configuration parameter that controls the amount that weights are updated, training step (also called epoch) is the single step that takes to do forward and backward propagation.
  3. Compute dot Product between input (X) and weights matrix (W), here we use np.dot() to do matrix multiplication, the dot product is also called the inner product, for example, the inner product between X and W is x1*w1 + x2*w2 + x3*w3 + … + Xn*Wn.
  4. Apply the activation function on the l1. l1 is actually the output of our network.
  5. Compute the loss or the errors model made, when we start training the network guesses the predictions, and then we compare the predicted class and the true one. Ideally, what we want when we train a neural net, is to find the optimal weights (W) that best minimize the error of our model.
  6. Compute the change factor, what we called update, In case our model predicts the class correctly the error is equal to zero and no changes are done on the weights, but if it makes a mistake, then we can have a negative (decrease) or positive (increase) update on the weights.
  7. We train our model in 100 training steps and using online or stochastic learning, which means that we use a single training example to make parameter update. Here the single training example is picked randomly.

Output after Training :

y_pred after training

Use Cases and Limitations

While the perceptron classified the instances in our example well, the model has limitations, actually, Perceptron is a linear model and in most cases linear models are not enough to learn useful patterns.

Left: And Gate, Right: Not and OR Logic Gates

Linear models like the perceptron are not universal function approximators, if you were to try to change the problem for XOR Logic gate, the perceptron would fail because XOR doesn’t have linear separability property as shown in the image below:

source

Neural Networks are known to be universal function approximators, so what are we missing? That’s where Hidden layers come at hand and it’s what we are going to talk about in the following next.

What’s Next?

We are going to talk about the power of hidden layers and the processing they are capable of doing and how we train complex neural networks using backpropagation and Gradient Descent based algorithms, and if you are willing to go deeper in the study of the inner details of neural networks, check out the links below, Stay tuned!

References:

Before you go!

If you enjoyed the writings leave your claps 👏 to recommend this article so that others can see it.

With by David Fumo.

May The Force Be With You!

--

--

Passionate about technology, financial markets and above all, Humanity. I share my journey to Self-Discovery and Personal Growth.