
In this article, you’ll learn about the bias-variance-covariance decomposition.
The error of a regression model can be analyzed with the bias-variance trade-off. For ensembles, this error can be further decomposed with a covariance term.
Here’s how you can use this decomposition to improve a forecasting ensemble.
Introduction
Diversity among individual models is a key ingredient for building successful ensembles.
Each model should make accurate forecasts. But, these forecasts should also be different from other models. Thus, combined predictions reduce the impact of individual errors.
This leads us to two questions:
- How do we measure ensemble diversity?
- How do we introduce diversity in an ensemble?
Let’s dive into these questions.
Measuring Diversity
The bias-variance trade-off is a standard way of analyzing regression models. Bias relates to the average distance between predictions and actual values. Variance relates to the variability of forecasts on different samples.
Low bias means high variance, and vice-versa. This trade-off is related to the complexity of models. Increasingly complex models tend to have a lower bias (but higher variance).
A forecasting ensemble is also a regression model. It can be decomposed into these two terms. But, it is better analyzed using a three-way decomposition: the bias-variance-covariance decomposition.
This decomposition is defined as follows:

The terms in the equation above are the average bias, average variance, and average covariance of an ensemble with M models.
We already know what the bias and variance terms are. Besides these, the expected error of an ensemble depends on a covariance term. Covariance measures how a pair of models change together. Thus, it is a good way to quantify diversity. Larger values in covariance (i.e. lower diversity) lead to a larger expected error.
You can think of this as follows:
expected error = average bias + average variance – diversity
So, ensemble diversity directly impacts its expected forecasting performance.
Here’s how you can code this decomposition:
import numpy as np
import pandas as pd
class BiasVarianceCovariance:
@classmethod
def get_bvc(cls, y_hat: pd.DataFrame, y: np.ndarray):
return cls.avg_sqr_bias(y_hat, y), cls.avg_var(y_hat), cls.avg_cov(y_hat)
@staticmethod
def avg_sqr_bias(y_hat: pd.DataFrame, y: np.ndarray):
"""
:param y_hat: predictions as pd.DataFrame with shape (n_observations, n_models).
The predictions of each model are in different columns
:param y: actual values as np.array
"""
return (y_hat.mean(axis=0) - y.mean()).mean() ** 2
@staticmethod
def avg_var(y_hat: pd.DataFrame):
M = y_hat.shape[1]
return y_hat.var().mean() / M
@staticmethod
def avg_cov(y_hat: pd.DataFrame):
M = y_hat.shape[1]
cov_df = pd.DataFrame(np.cov(y_hat))
np.fill_diagonal(cov_df.values, 0)
cov_term = cov_df.values.sum() * (1 / (M * (M - 1)))
return cov_term
You can find more details about the bias-variance-covariance decomposition in reference [1].
Creating Diverse Ensembles

The bias-variance-covariance decomposition shows the importance of encouraging diversity in ensembles. How can you do that?
Here are three possible approaches:
- Manipulating the training data;
- Using different algorithms or configurations;
- Pruning the ensemble.
Manipulating the training data
Some of the most successful ensemble methods follow this approach. For example, bagging and boosting.
Bagging is an ensemble of decision trees. For each tree, the available data is resampled with a bootstrapping technique. So, you get different training sets for each tree, thereby introducing diversity. Random Forests also do bootstrapping, and more. They add randomness to the way explanatory variables are used. This further increases diversity among trees.
Boosting also changes the input data, but in a different way. One key aspect is that the models are trained sequentially. After each iteration, the training instances are re-weighted according to previous errors.
Using different algorithms or configurations
Varying the algorithm is a quick and easy way of improving ensemble diversity.
Different methods (say, a decision tree and linear regression) have different assumptions about the data. This leads to models that perform differently.
Ensemble Pruning
Another way of improving diversity is by ensemble pruning.
Pruning refers to the process of removing unwanted models from the ensemble. In this case, you would discard highly correlated models. This results not only in better diversity but also lower costs.
Case Study: Decomposition of a Random Forest
Let’s use the bias-variance-covariance decomposition to analyze the error of a Random Forest.
In this example, we’ll use a time series about sunspots.
![Monthly sunspots time series [3]. Image by author.](https://towardsdatascience.com/wp-content/uploads/2023/01/18nI3TdOP1ctjcfkPh62IzQ.png)
You can train a Random Forest for forecasting as follows:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from pmdarima.datasets import load_sunspots
# src module here: https://github.com/vcerqueira/blog
from src.tde import time_delay_embedding
from src.ensembles.bvc import BiasVarianceCovariance
# load time series
series = load_sunspots(as_series=True) # GPL-3
# train test split
train, test = train_test_split(series, test_size=0.3, shuffle=False, random_state=1)
# time series for supervised learning
train_df = time_delay_embedding(train, n_lags=12, horizon=1)
test_df = time_delay_embedding(test, n_lags=12, horizon=1)
# creating the predictors and target variables
target_var = 'Series(t+1)'
X_train, y_train = train_df.drop(target_var, axis=1), train_df[target_var]
X_test, y_test = test_df.drop(target_var, axis=1), test_df[target_var]
# training a random forest ensemble with 100 decision trees
rf = RandomForestRegressor(n_estimators=100, random_state=1)
rf.fit(X_train, y_train)
# getting predictions from each tree in RF
rf_pred = [tree.predict(X_test) for tree in rf.estimators_]
rf_pred = pd.DataFrame(rf_pred).T
# bias-variance-covariance decomposition
rf_a_bias, rf_a_var, rf_a_cov = BiasVarianceCovariance.get_bvc(rf_pred, y_test.values)
Here’s a sample of the forecasts:

Here’s how the error breaks into each term:

You can use this information to guide the development of an ensemble.
In this example, most of the expected error is due to the covariance term. You can reduce it by improving the diversity of the ensemble. We explored three approaches to do this. For example, you could try to prune the ensemble by removing correlated trees.
Key Takeaways
- The expected error of a forecasting ensemble can be decomposed into three parts: bias, variance, and covariance;
- The covariance term measures the diversity in the ensemble;
- This decomposition is valuable for guiding the development of an ensemble;
- There are many ways of improving ensemble diversity. These include manipulating the training data, using different algorithms, or ensemble pruning.
Thanks for reading, and see you in the next story!
Related Articles
- Introduction to Forecasting Ensembles
- How to Combine the Forecasts of an Ensemble
- Dynamic Forecast Combination using R from Python
References
[1] Brown, G., Wyatt, J., Harris, R., & Yao, X. (2005). Diversity creation methods: a survey and categorisation. Information fusion, 6(1), 5–20.
[2] Brown, Gavin, et al. "Managing diversity in regression ensembles." Journal of Machine Learning research 6.9 (2005).