A Beginner’s Guide to Neural Networks: Part One
The motivation behind neural networks, and the architecture behind the most basic one: the perceptron.
Humans are incredible pattern-recognition machines. Our brains process ‘inputs’ from the world, categorize them (that’s a spider; that’s ice-cream), and then generate an ‘output’ (run away from the spider; taste the ice-cream). And we do this automatically and quickly, with little or no effort. It’s the very same system that senses that someone is angry at us, or involuntarily reads the stop sign as we speed past it. Psychologists call this mode of thinking ‘System 1’ (coined by Keith Stanovich and Richard West), and it includes the innate skills — like perception and fear — that we share with other animals. (There’s also a ‘System 2’, and if you’d like to know more about this, check out Thinking, Fast and Slow by Daniel Kahneman).
So what does this have to do with neural networks? I’ll get there in a second.
You effortlessly recognized the digits above, right? You just knew that the first digit is 5; you didn’t really have to think about it. Surely your brain didn’t go, “Ah, that looks like two orthogonal lines connected to a rotated, baseless semi-circle, so that’s a 5.” Designing rules to recognize a handwritten digit is unnecessarily complicated, which is why, historically, it’s been quite difficult for computer programs to recognize them.
Neural networks loosely mimic the way our brains solve the problem: by taking in inputs, processing them and generating an output. Like us, they learn to recognize patterns, but they do this by training on labelled datasets. Before we get to the learning part, let’s take a look at the most basic of artificial neurons: the perceptron, and how it processes inputs and produces an output.
The Perceptron
Perceptrons were developed way back in the 1950s-60s by the scientist Frank Rosenblatt, inspired by earlier work from Warren McCulloch and Walter Pitts. While today we use other models of artificial neurons, they follow the general principles set by the perceptron.
So what are they, anyways? A perceptron takes several binary inputs: x1, x2, …, and produces a single binary output:
Let’s understand this better with an example. Say you bike to work. You have two factors to make your decision to go to work: the weather must not be bad, and it must be a weekday. The weather’s not that big a deal, but working on weekends is a big no-no. The inputs have to be binary, so let’s propose the conditions as yes or no questions. Weather is fine? 1 for yes, 0 for no. Is it a weekday? 1 yes, 0 no.
Remember, I cannot tell the neural network these conditions; it has to learn them for itself. How will it know which information will be most important in making its decision? It does with something called weights. Remember when I said that weather’s not a big deal, but the weekend is? Weights are just a numerical representation of these preferences. A higher weight means the neural network considers that input more important compared to other inputs.
For our example, let’s purposely set suitable weights of 2 for weather and 6 for weekday. Now how do we calculate the output? We simply multiply the input with its respective weight, and sum up all the values we get for all the inputs. For example, if it’s a nice, sunny (1) weekday (1), we would do the following calculation:
This calculation is known as a linear combination. Now what does an 8 mean? We first need to define the threshold value. The neural network’s output, 0 or 1 (stay home or go to work), is determined if the value of the linear combination is greater than the threshold value. Let’s say the threshold value is 5, which means that if the calculation gives you a number less than 5, you can stay at home, but if it’s equal to or more than 5, then you gotta go to work.
You have just seen how weights are influential in determining the output. In this example, I set the weights to particular numbers that make the example work, but in reality, we set the weights to random values, and then the network adjusts those weights based on the output errors it made using the previous weights. This is called training the neural network.
Going back to the handwritten digit recognition problem, a simple one-node network like the one above will not be capable of making such complex decisions. To enable that, we need more complicated networks, with more nodes and hidden layers, using techniques such as a sigmoid activation function to make decisions, and backpropagation to learn. All that in Part Two!
Next:
Resources
- Using neural nets to recognize handwritten digits by Michael Nielsen (more detailed explanations at the same level).
- The Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville (more advanced & technical, assumes undergrad level math knowledge).
- Udacity Deep Learning Nanodegree Foundation (17-week project-based course; difficulty is somewhere in the middle of 1. and 2.)
- TensorFlow Neural Network Playground (fun, interactive visualization of neural networks in action)