Understanding Neural Networks

Published in

Towards Data Science

6 min readOct 28, 2019

Neural networks generate a lot of interest. However, it’s not always clear to people outside of the machine learning community the problems they’re suited for, what they are, or how they’re built. We’ll address these topics in this blog post, aiming to make neural networks accessible to all readers. For those with programming experience, I’ve appended a Jupyter Notebook at the end which you can follow to build your own neural network.

Most commercially successful applications of neural networks are in the area of supervised learning. In supervised learning, we are trying to build a model that maps inputs to outputs. Some examples include:

We can represent these models as function that takes an input and produces an output:

y = F(x)

where x is the input, y is the output, and F() is our model. Neural networks are a particularly effective way to build a model (i.e. F() ) for many classes of problems.

Let’s briefly consider a traditional approach for building many models. We can derive models describing many phenomena by applying our understanding of calculus to specific domain knowledge. In physics, this would include Newton’s Laws of Motion, or conservation laws stating that mass, energy, and momentum are conserved in a closed system. This approach let’s successfully build a variety of important models, such as the ideal rocket equations which tell us how much fuel a rocket needs to reach space, or the Boussinesq equations which let us model waves along the coast.

What about problems for which we don’t have intuition into the fundamental dynamics? Say you are building an autonomous vehicle, and want to recognize other cars on the road using a video stream from your dashboard camera. Despite the fact that we’re all quite good at recognizing cars, we haven’t been able to formulate physical principles describing what a car looks like. We can’t point to a combination of wheels, doors, and windows which make up a car. Neural networks provide us a technique which we can use to solve these types of problems effectively.

Neural networks work by learning the mapping from input to output directly from data. The process of having the neural network learn this mapping is known as training. Training requires a dataset of training examples, which are pairs of inputs and the corresponding outputs (x and y). For training to be effective, we need a large dataset, typically tens of thousands to tens of millions of training examples.

During training, we are optimizing the weights (or parameters) of the neural network. For each training example, we run the model on the input, and compare the model output to the target output using a loss function. Using an algorithm called backpropogation (or backprop for short), we update all the weights in the network so that the model output will be closer to the target output. The weights are updated proportionally to how much they contribute to any mismatch. We continue cycling through our training set, iteratively updating the model, until performance no longer increases.

Let’s look at a visualization of a straightforward neural network. On the left hand side we have the input layer. This is our data, such as the pixels of an image, or how many times certain words appear in an email. Next we have two hidden layers. Hidden layers is a term we refer to the layers between the input and the output layers. Finally, we have the output layer. As the input passes through each layer of the neural network, it undergoes a series of computations. Each ‘unit’ (or ‘neuron’) in the hidden layers and the output layer contain a set of weights to be optimized, which control these calculations.

Designing a neural network requires selecting hyper-parameters controlling both the architecture of the network and the training process. We use the term hyperparameters, since the term parameters is an alternative to weights. Hyperparameters related to the architecture include the number of layers, the width of each layer (i.e. number of units), as well the choice of something called the activation function in the units. The training is particularly influenced by the choice of optimization algorithm, the learning rate used by the algorithm, and whether a technique called regularization is implemented.

Unfortunately, there is no way to know beforehand what is the best architecture for your problem, or what the best parameters for training that architecture would be. Practitioners are guided by a combination of experience, intuition, and best practices from the community. As the field is moving rapidly, this requires continuously staying up to date. Often the best way to start on a problem is to look at the machine learning literature to see if someone has solved a problem similar to yours, and take their solution as a starting point. Getting a solution to your particular problem will usually require several iterations of looking at the data, realizing ideas, modifying the model code, and testing.

A key question of any neural network is how well is it able to perform on data it hasn’t seen before. Therefore, before training a neural network, a small portion of the data is set aside in what is commonly referred to as the test set. Following training a neural network, we compare the performance of the neural network on the training set and the test set. One possible scenario is that the model doesn’t perform well even on the data it’s trained on. In case, we say the model has high bias. When the model doesn’t fit the data it has seen well, the hyperparameters should be reevaluated. Another outcome is that the model performs well on the training set, but not very well on the test set. In this situation, we say the model has high variance — that the neural network has been overfit to the training data. Equivalently, the network hasn’t learnt features of the data that generalize well, and perhaps has memorized features specific to the training set. To address high variance, we typically employ a technique called regularization. When feasible, acquiring additional data is also beneficial.

An important detail is that the data you train your neural network has to be similar to the data you apply your neural network to. Statistically, you want your training and test data to come from the same distribution. Intuitively, this means if you train your autonomous vehicle to drive exclusively in sunny weather, you can’t expect it to stay on the road during a snowfall. In practice, this means if you develop a neural network to predict customer behaviour, you’ll need to update your model periodically as your important factors such as your products, customers, and competition evolve.

These are the broad concepts key to understanding how to apply neural networks, and communicate with those who regularly work with them. After reading this, you should be able to confidently navigate a conversation on applying neural networks.

To summarize the key points about neural networks:

They are an effective way to build a model mapping from input to output directly from a training dataset.
Neural networks have many hyperparameters, and designing a good network is an iterative process.
Your training and test data should come from the same distribution.

Finally, a lot of software developers I speak to are keen to know implementation details of neural networks. The following Jupyter Notebook detail has been designed with you in mind, implementing a neural network to recognize handwritten digits with 98% accuracy. We’ll assume you have python environment setup with PyTorch. If you don’t, Anaconda is the recommended package manager and pretty simple to get started with.

You can download the jupyter notebook to run on your own machine here, as well as follow along with the precomputed version here.

Originally published at https://inletlabs.com.

Understanding Neural Networks

Written by Conrad Koziol