Neural Networks Simplified!

DITI MODI
Towards Data Science
10 min readJul 18, 2018

--

courtesy- here

Since how long had you been waiting to implement your first ANN but couldn’t find correct resource with simplified explanation? Well, here it is. Before going through this post, I would suggest you to have a clarity of Neural Networks from my last post here , so that this next explanation feels quite comprehensible.

Before diving deeper into neural networks, I would like to discuss the applications of neural networks in general so as to answer the question- Where would I find Neural Networks applicable in real life?

courtesy- here

APPLICATIONS IN REAL LIFE:

courtesy- here
  1. Have you heard somewhere about a machine predicting the stock market prices? Yes, so cool isn’t it? Since neural networks are really good at processing complex data and help analyse it, stock prediction is one of the applications where neural networks work seamlessly great. Such predictive type applications like valuation of a property,employee job performance etc. hugely are applied using neural networks.
  2. The application being image processing and computer vision, neural nets offer a very good advantage here over the traditional gray scale conversion methods is the most important of all . Image to text, image restoration, image compression, pattern recognition or the most famous of all- object detection and face recognition , are just few of the major examples.
courtesy- here

3. Generative Adversarial Network(GAN) is another application where neural network can generate images on it’s own. It can greatly be beneficial for the precise image generation and merging of two images. This is the latest application of neural nets and is been researched widely.

4. Voice recognition, weather prediction, fingerprint recognition, handwriting recognition etc. are other small-scale applications using neural nets.

5. Medical diagnosis is an application where major improvements can be made using neural nets.

Can you guess the application of neural networks in your everyday life?

It is used by your mobile and apps that you use almost everyday. So here it is- the automatic facial recognition in your cameras of your mobile phones is the most used example of Convolutional Neural Networks, the automatic recognition of faces by Facebook and Photos in Google Photos apps are all paragons based out of these Convolutional Neural Networks. The list of applications can be endless here.

Now we know that neural networks are a jackpot! I am sure you now might be more interested in knowing about neural nets.

How to go about this blog…

I would like to set forth the theoretical concepts and logic behind a normal ANN with the practical implementation alongside which actually deepens our understanding of the theory. Hence I have divided the entire blog into parts of practical implementation, theory behind it and the logic used to implement it. It is all a part of one implementation with continuous flow but just the parts are differentiated for better understanding and correlation.

Do you know how did the idea of neural nets actually emerge?

It was inspired by our own human biological neural network! The working totally resembles the effective working of layers and layers of neurons to form a biological neural network. For artificial neural network, the flow and operational qualities are all simulated from the biological one since the basic principle that we require to model is the simulation of functionality of a human brain to learn from examples as I have mentioned in my previous blog.

WE THUS BEGIN…

We take the implementation of ANN model in Python using the most famous MNIST dataset of classifying handwritten digits from 0 to 9 in Tensorflow library.

PRE REQUISITES:

Tensorflow is the most efficient and useful library used for neural networks implementation in Python. I prefer to use Jupyter Notebook for implementation. However you can use any IDE of your choice- Spyder, Sublime Text 3 etc.

Before we start the program, we need to install Tensorflow. The installation is quite easy and can be installed from here. CPU version can be downloaded for a normal Desktop.

The entire code and output screenshot is available on my github here. Feel free to fork!

THEORY AND TERMS:

To understand that, we need to introduce 4 basic terms here:

  1. PERCEPTRON: It is nothing but a simple layer neural network inspired by a biological neuron as shown below.
courtesy-https://towardsdatascience.com/from-fiction-to-reality-a-beginners-guide-to-artificial-neural-networks-d0411777571b

2. INPUT: Input is nothing but the data that we want to feed to the network. It is usually defined as x1,x2,…xn. Each input is a sample or independent variable used to determine the output. Input can be of any form-text, images or video as a whole. Here we have images of handwritten digits as input.

3. OUTPUT: The output is generated for the respective neuron that fires the activation.

4. ACTIVATION FUNCTION: The input signal after going through an activation function can further be processed by other layers or can directly generate output. An activation function can be essentially termed as a processing unit for processing input to get the output. It is used to map the output as to between 0 to 1 or -1 to 1 depending on the type of activation function. There are various types of activation functions including sigmoidal, step, exponential functions etc.

An input symbolizes the dendrites of a biological neuron which is the receiver for the neuron to transmit information from one neuron to another. Similarly other similarities are as follows according to the functions defined.

courtesy- here

IMPLEMENTATION:

Import Dependencies

Next we import our dependencies that is Tensorflow and the MNIST data that we are going to use for training ahead.

Read data

Next, we read the dataset and save to the variable “mnist” specifying the path as first parameter and one_hot as TRUE. Now what is one_hot you may think.

LOGIC USED IN APPLICATION:

FOR INPUT:

A neuron essentially in an ANN model implementation is nothing but the placeholder to hold a numerical value like here we have gray scale values ranging from 0(black) to 1(white) stored in the neuron.

These neurons together form the input as a black and white image input.

courtesy-https://gfycat.com/gifs/tag/neural+networks

FOR OUTPUT:

One_hot is set true for activating an output node whenever we get the corresponding output.

For example here we have the total number of possible outcomes as 10 since we have 0–9 digits to be identified. So for any input image provided as input, any one neuron out of 10 corresponding to each output identified by our model would be lit up. For instance,

for digit 0=[1 0 0 0 0 0 0 0 0 0],

for digit 1=[0 1 0 0 0 0 0 0 0 0] so on…

where the array contains set of binary values for respective output neurons.

So we conclude that,

The first layer for a neural network is always an input layer while the last layer always being the output layer.

THEORY AND TERMS:

Then what about the middle layers where actually the processing takes place?

These layers are nothing but the hidden layers. This is where all the magic begins!

An Artificial Neural Network is inspired by a biological neural network consisting of multiple neurons working simultaneously and in co-ordination to perform various tasks for our body. Thus we have multiple layers of neurons in ANN as well for better performance. The deeper we go, the better but for a small dataset it would be a problem.

So how to decide the number of hidden layers to be used?

There can be as many hidden layers as we like but it also depends on the size of the dataset. Though more number of hidden layers are better as the net becomes deeper, for a smaller dataset it would reduce the accuracy instead of increasing it.

IMPLEMENTATION:

We choose 3 hidden layers for our model and 500,600,300 respectivel are the total number of nodes in that layer.

Defining total number of nodes in hidden layers
Defining variables

Later we define the total number of output classes, batch size, and the tensors(placeholders) to define our input x and output y in terms of variables.

LOGIC USED IN APPLICATION:

PROCESSING IN HIDDEN LAYERS:

Analogous to biological neuron, in ANN the 784 input neurons with specific gray scale value fire(trigger) specific neurons in next hidden layer only which in turn would trigger some pattern in the one after it and finally a specific pattern in output layer. The brightest neuron is chosen by the network as the final output.

courtesy- here

What essentially hidden layers do is they separate the digits into separate parts of loops and lines that form that digit. For example below we see 9 has one loop and a line which combines to form the digit. Specific pattern is thus generated at various layers which is different than the pattern generated for 8 which has 2 loops. Thus we need deeper nets to identify the key patterns for larger datasets with varying features and huge images.

courtesy- here

In short:

Pixels — >Edges — >Patterns — >Digits

THEORY AND TERMS:

Now,How do we make our model fire only specific neurons for specific patterns?

The answer is by using “weights” and “biases”.

5. WEIGHTS: Weights are nothing but the axon in a biological neuron. The axon as we see in the image is used to transmit information from one neuron to other. It is nothing but the connection link. This is a very important information as weights decide the excitation and inhibition of a signal to be communicated. Simply it follows the reward-punishment way that is the excitation causes one neuron to activate another neuron and opposite for inhibitory ones. This is the same as biological transfer of messages in the neural network.

6. BIAS: It can be visualized as a switch which is always on that is it’s value is always one. The basic idea is that human neurons take a bunch of inputs and “add” them together. If the sum of the inputs is greater than some threshold, then the neuron will “fire” (produce an output that goes to other neurons). This threshold is essentially the same thing as a bias. So, in this way, the bias in artificial neural nets helps to replicate the behavior of real, human neurons.

IMPLEMENTATION:

So we initially define random weights and biases for each hidden layer and the output layer for total number of neurons defined for that layer.

Define weights and biases

After defining all our variables, we start defining our model and the functions of the layers using weights and biases.

Activation function

LOGIC USED IN APPLICATION:

We assign certain positive weights to excite/fire a neuron and a negative weight to inhibit/stop from firing a neuron for a given pattern in that layer.

To mathematically compute the values, we multiply the input with the respective weight for the entire layer and finally sum it up with the bias:

weighted sum=(weight*input)+bias

We then feed this weighted sum to the activation function called “RELU” called Rectified Linear Unit.The activation is simply thresholded at zero.

It computes the function f(x)=max(0,x). It is the most used activation function in deep learning as it greatly accelerates the training part.

THEORY AND TERMS:

So where exactly is the learning part here of “deep learning”?

If you remember, we had initialized random weights and biases initially because we were not aware of them at the start.

We finally aim to minimize the error caused in predicting an output. Learning specifically thus means to determine these weights and biases in such as way that the error is minimized. This is done by the optimizers.

Here we optimize our model using Adam Optimizer to minimize the cost. Adam Optimizer using Stochastic Gradient method to optimize our model. There are various other optimizers available using different methods.

IMPLEMENTATION:

The last part after defining the model is the definition if how the model should be trained.

We predict the output for each input x and then determine the cost/error that our model has committed by comparing the predicted value with the labeled correct value y in our dataset.

Prediction of output

We define the session for looping till total epoch number is reached. We train the input in batches together and run the optimizer.

Finally we use “correct” variable to check the one_hot value of the predicted output against the expected value.

Finally calculating and printing the accuracy of our model to see the effectiveness.

The last line is to call the training function with input x and output y to initiate the training.

LOGIC USED IN APPLICATION:

How long does our model train itself?

The model keeps training till the error reaches a threshold predefined or by defining the total epochs(loops) needed.

We can tweak many parameters like number of hidden layers, number of nodes or optimizer used etc. to check and lower the error rate.

Essentially the work of a neural network can be summarized as:

Summarizing the process
courtesy- here

Phew! That was a bit overwhelming for a beginner, isn’t it?

I know how that feels at the beginning :/

But trust me, it would all feel basic once we are thorough with it after practice. For any doubts, comment down below. Let me know your views about this post and any topic you would want me to post about next.

--

--