How Many Neurons Should Your Neural Network Have?
As data scientists, we often face a lot of decisions when building a model, such as which model to use, how big it should be, how much data is needed, how many features to add or whether to use or not use regularization. While some decisions can be evaluated relatively quickly, others such as gathering more data may take months, just to find out that it did not help.
In this article, I would like to practically demonstrate an approach of making these decisions based on looking at bias and variance, which changed the way I proceed in almost every project.

Bias and Variance in Practice
What many people do wrong when tuning their models, is they look solely on their validation error. While this is in the end the most important number (except for test error), looking simultaneously at the training error might give you several hints of where to go with your model.
There might be a different and more formal definitions of bias and variance, but what it practically comes to is that:
Bias is an error rate on your training set (1-training accuracy)
Variance is a rate of how much worse is the model on validation set compared to training set (training accuracy-validation accuracy)
I will demonstrate the importance of these concepts with a practical example.
Creating Dataset
Let’s first create a dataset that we will use to train and evaluate our models.
I will do that using sklearn’s make_classification, and afterwards split the dataset into training and validation set.
Scenario #1: High Bias, Low Variance
Next, we will start off by creating a relatively small Neural Network using Keras, and training it on our dataset.
The trained model gets the following result:
Training accuracy: 62.83%
Validation accuracy: 60.12%
Bias: 37.17%
Variance: 2.71%
We can see that our model has a very high bias, while having a relatively small variance. This state is commonly known as underfitting.
There are several methods to reduce bias, and get us out of this state:
Increase model’s size
Add more features
Reduce regularization
Scenario #2: Low Bias, High Variance
Let’s try the method of increasing model’s size to reduce bias, and see what will happen.
In order to do that, I increased the number of neurons in every hidden layer.
Our bigger model gets the following result:
Training accuracy: 100.0%
Validation accuracy: 89.82%
Bias: 0.0%
Variance: 10.18%
As you can see, we have successfully reduced the model’s bias. Actually, we have completely eliminated it, however, the variance has increased now. This state is commonly known as overfitting.
The methods to reduce variance of the model are:
Decrease model’s size
Decrease number of features
Add regularization
Add more data
Scenario #3: Low Bias, Low Variance
Let’s this time try to reduce variance by introducing some regularization to our model.
I added the regularization to every layer in a form of Dropout (randomly ignoring a set of neurons during training).
The result of our new model is:
Training accuracy: 98.62%
Validation accuracy: 95.16%
Bias: 1.38%
Variance: 3.46%
Perfect! We are now very close to an optimal state with relatively low bias as well as relatively low variance, which is exactly what we were aiming for. If we now look at our validation error (1-validation accuracy or bias+variance), it is the lowest it has been so far.
Maybe you have noticed that in the last scenario, the bias has increased a bit again, compared to the scenario #2. You can also see that the methods for reducing bias and reducing variance are the exact opposites. This property is called Bias–Variance Tradeoff, and is demonstrated in the following graph:

Basically, we are trying find a balance between bias and variance, that minimizes the total error.
Conclusion
We went through 3 different scenarios of tuning the model based on bias and variance, and the corresponding steps that could be taken.
There potentially exists a 4th scenario of having both high bias and high variance, which has not been covered. However, this usually means that there is something wrong with your data (training and validation distribution mismatch, noisy data etc.), and therefore, it is difficult to give you an exact guideline.
I hope that this approach will help you with prioritizing tasks in your projects, and in the end save you some time.
All the code that I used can be found on GitHub.
Inspired by Machine Learning Yearning by Andrew Ng.
How Much Time Can You Save With Active Learning?
Zero-Shot Learning the Alphabetic Characters (an experiment with code)