The world’s leading publication for data science, AI, and ML professionals.

Bias and Variance in Machine Learning

The key to success is finding the balance between bias and variance.

Photo by Aziz Acharki on Unsplash
Photo by Aziz Acharki on Unsplash

In predictive analytics, we build Machine Learning models to make predictions on new, previously unseen samples. The whole purpose is to be able to predict the unknown. But the models cannot just make predictions out of the blue. We show some samples to the model and train it. Then we expect the model to make predictions on samples from the same distribution.

There is no such thing as a perfect model so the model we build and train will have errors. There will be differences between the predictions and the actual values. The performance of a model is inversely proportional to the difference between the actual values and the predictions. The smaller the difference, the better the model. Our goal is to try to minimize the error. We cannot eliminate the error but we can reduce it. The part of the error that can be reduced has two components: Bias and Variance.

The performance of a model depends on the balance between bias and variance. The optimum model lays somewhere in between them. Please note that there is always a trade-off between bias and variance. The challenge is to find the right balance.


What is bias and variance?

Bias occurs when we try to approximate a complex or complicated relationship with a much simpler model. I think of it as a lazy model. Consider a case in which the relationship between independent variables (features) and dependent variable (target) is very complex and nonlinear. But, we try to build a model using linear regression. In this case, even if we have millions of training samples, we will not be able to build an accurate model. By using a simple model, we restrict the performance. The true relationship between the features and the target cannot be reflected.

The models with high bias are not able to capture the important relations. Thus, the accuracy on both training and set sets will be very low. This situation is also known as underfitting. The models with high bias tend to underfit. Consider the scatter plot below that shows the relationship between one feature and a target variable. If we use the red line as the model to predict the relationship described by blue data points, then our model has a high bias and ends up underfitting the data.

High bias , underfitting
High bias , underfitting

Variance occurs when the model is highly sensitive to the changes in the independent variables (features). The model tries to pick every detail about the relationship between features and target. It even learns the noise in the data which might randomly occur. A very small change in a feature might change the prediction of the model. Thus, we end up with a model that captures each and every detail on the training set so the accuracy on the training set will be very high. However, the accuracy of new, previously unseen samples will not be good because there will always be different variations in the features. This situation is also known as overfitting. The model overfits to the training data but fails to generalize well to the actual relationships within the dataset. The accuracy on the samples that the model actually sees will be very high but the accuracy on new samples will be very low. Consider the same example that we discussed earlier. If we try to model the relationship with the red curve in the image below, the model overfits. As you can see, it is highly sensitive and tries to capture every variation.

High variance, overfitting
High variance, overfitting

So neither high bias nor high variance is good. The perfect model is the one with low bias and low variance. However, perfect models are very challenging to find, if possible at all. There is a trade-off between bias and variance. We should aim to find the right balance between them. The key to success as a machine learning engineer is to master finding the right balance between bias and variance. A preferable model for our case would be something like this:

A good fit
A good fit

Thank you for reading. Please let me know if you have any feedback.


Related Articles