
Imagine this: you’re sitting at your desk, sorting your digital files into folders… again. Piece by piece, you drop them into separate folders on your desktop, and suddenly it’s two hours later. This may be very relatable for most of us, but it doesn’t have to be that way. What if there was a computer program that could sort these files for you? That could almost think for itself? And have immaculate accuracy to boot? Meet…. drumroll please… the neural network! 🎉
A neural network is a type of computer program that can determine patterns in sets of data, and through this be trained to do certain tasks much quicker than your average computer program. It learns to predict the outcome of a certain piece of data by looking at thousands of different examples that you feed it.
An MLP, or multi-layer perceptron, is a type of neural network composed of many different interconnected perceptrons, or mathematical equations. Each perceptron is connected to other perceptrons, and connections have strengths, or weights, between them that affect the outcome of the neural network. You may have heard of MLPs before as feedforward artificial neural networks.
Neural Networks 🧠
A neural network is a type of Artificial Intelligence, or AI for short. AI is all around you right now – in your Amazon Alexa device, Waymo’s self-driving cars, traffic lights, online shopping recommendations, drug discovery, facial recognition, Roomba vacuums… the list goes on. Neural networks use a series of mathematical algorithms to recognise underlying patterns in a set of data, and they use those patterns to predict outcomes of new data they are given. Artificial neural networks attempt to mimic the way that the human brain operates. Neural networks are composed of neurons – each neuron is simply an equation that takes inputs, multiplies those inputs by a set of weights, and passes the output on to the next neuron. Now, that might sound like a bunch of gibberish, but I’ll explain these concepts further in a bit.
Neurons? Weights? What is this sorcery? 🧙♂️
You may be wondering what on earth all of these fancy words are. Trust me, I felt the same way! First of all, it’s important to understand the basics of how a human brain works, as it’s what AI programmers model their Neural Networks after. A human brain is composed of about 86 billion neurons that fire electricity ⚡️ between them when faced with different stimuli. Different parts of the brain are responsible for different things such as emotions, reasoning, and memory. All of these sections are composed of neurons.

An artificial neural network attempts to replicate the way the human brain works, albeit not quite in the same way. As the human brain is SO complex, programmers have to drastically simplify the thinking process and boil it down to a set of mathematical equations that data is run through to produce an output.
MLPs
MLPs are the most basic form of an artificial neural network. They use a series of perceptrons, or equations with inputs, outputs and weights, to turn a series of inputs into a singular output between 0 and 1. That output is then fed into another layer of perceptrons, and the process continues until it reaches a singular output (or set of outputs, depending on the function of the MLP). An MLP has at least three layers – an input layer, at least one hidden layer, and an output layer. The hidden layer makes it easier to distinguish data that you can’t categorise linearly – for example, something like this:

The lines dividing the blue and red data points would not be linear – they must be curved in order to best categorize the data. This is the type of dataset that an MLP would be well-suited to.

The most important part of an MLP is the weights. Each input of the neural network is passed through "layers", where the input is multiplied by different weights (represented with the letter W). Weights signal the strength of a connection between two pieces of data.
I like to think of weights like a family. Your brother is closer to you on the family tree than, say, your first cousin. Your mother is closer to you on the tree than your second cousin once removed. These amounts of "relatedness" on the family tree correspond to the weights of perceptrons – a higher weight indicates a sibling, and a lower weight indicates something like a third cousin twice removed. The weights correspond to the strength of a connection.
Here’s a graphic showing how this works in a neural network:

The weights really determine what the output of the neural network is, because they provide the strength of each connection between data points and ideas. Here’s another analogy: the weights of the MLP are like ropes, and some ropes are stronger than others. If the weight is really small, the rope might be a tiny, super-thin thread. But if the weight is really large, the rope could be 3 feet thick!
There’s another important component you might have noticed in the image above, and that is the bias. The bias is simply an offset of the final outcome of the equations with the weights. Sometimes you need this offset to get the right output, depending on the situation. For example, if the bias was one, the final output of each input and its weights would have an extra one added onto it to neutralise the output.
Going back to the previous image, if your inputs were all 1’s and your weights were all 1’s, and the bias was 1, your network would look like this.

Finally, we need to understand how each neuron’s output is calculated.
First, a neuron calculates the weighted sum of the inputs, and adds the bias at the end.

For example, if the inputs are:

And the weights are:

Then a weighted sum is calculated like this, by multiplying each input by its respective weight, then adding all the products together.

Then we add the bias to the weighted sum…

…and finally, the value is fed into the activation function, which then prepares an output. The activation function is simply a mathematical equation that normalises the inputs, turning it into a value between 0 and 1 for the computer to read. You can either use the sigmoid function, the ReLU function, or the Tanh function. Often the sigmoid function is the best choice, because it provides a smooth transition between zero and one and more variability between perceptron outputs.

If you’re interested in any of the other types of activation functions, feel free to look them up or read this article.

Going back to our previous example, which looked like this:

Each input (let’s set them all to 1 for simplicity) would be multiplied by its respective weight, in this case 1.

Then the weighted inputs are added together to create a weighted sum.

Then we add the bias to the weighted sum…

… and calculate the final product.
Once you’ve passed it through the activation function, you get a number between 0 and 1, depending on which function you use. I passed the number 5 through the sigmoid function, and I got the number 0.993307149076 – this number was then rounded to two decimal places, making 0.99.
This number between 0 and 1 is the "brightness" of the neuron – the closer to 0, the dimmer, and the closer to 1, the brighter. This "brightness" indicates the strength of the neuron.
If you’re having some trouble with these concepts (I know it’s a lot to take in) I recommend watching this Youtube video!
So that’s how each singular neuron (also known as a perceptron) works- how does this tie into the network as a whole?
MLPs – An Intricate System of Perceptrons
A neural network can be as simple as this:

Or as complicated as this:

It all depends on how many of these neurons, or perceptrons, you use within the network. The main idea of an MLP is that the output of the perceptron becomes one of the inputs of the next layer of perceptrons. By doing this, output possibilities are narrowed down until the choices for output are minimal and concise.
Training an MLP
Of course, a neural network isn’t going to work perfectly the first time. We start out with random weights, and obviously this doesn’t work very well. So we have to "train" the network on a series of practice inputs so it can adjust its weights to achieve optimum accuracy. To do this, we label a large set of training data with the output that it should give. (This is called supervised learning – there is also unsupervised learning, without labels, which is a whole other topic.) We run this training data through our MLP, and if the answer is wrong, the weights and bias are adjusted. This process is repeated over and over again until the MLP can classify all the training data correctly. After this, we run some new data, called test data, through the neural network that it has never seen before.
Think about this like a math test. During the period of learning before the test, your teacher gives you lots of practice questions. You learn to master these questions and soon know how to solve all of them. When you get to the final test, you see some new questions that you’ve never seen before, and you do your best to answer them in the same way that you answered the practice questions.
In a neural network, the training data is akin to the practice questions, and the test data is like the math test.
So why does this matter?
MLPs are an efficient way to categorise data automatically. Once you’ve trained a neural network, the program will accurately classify data much faster than you or your average computer program could. MLPs can save an astronomical amount of time in everyday life, and they’re more common than you think.
The pixels of an image can be reduced down to one long row of data and fed into a MLP. Then the image can be classified according to patterns found within it. The words of a document can also be reduced to one long row of data and fed to a MLP. The MLP can classify the general idea of the article based on patterns it sees.
MLPs can be used for a very wide variety of tasks; for example, simple image recognition, categorisation of files and data, online shopping recommendations, social media, and more!
MLPs are useful in so many different fields, and now you know a little bit more about them 🤩