The world’s leading publication for data science, AI, and ML professionals.

Wisdom of the Forecaster Crowd

Ensemble Forecasts of Time Series with Darts

in Python

Chuotanhls / Chu Viet Don, Black Group Crowd - Free photo on Pixabay
Chuotanhls / Chu Viet Don, Black Group Crowd – Free photo on Pixabay

In yesterday’s article, we discussed the Darts multi-method package and how to run a forecast tournament on a given time series. We reviewed which of the five chosen models – exponential smoothing, Theta, SARIMA, naïve forecast, and facebook Prophet – would generate a better fit to the source data, the classic airline passenger numbers of Box & Jenkins. Darts’ Swiss Knife for Time Series Forecasting | Oct, 2021 | Towards Data Science

If you read the article, you may remember that the grid of forecast plots had left the sixth cell, in the bottom-right corner, conspicuously empty.

image by author
image by author

Let’s fill this gap by going one step further than yesterday:

We will create an ensemble forecast that combines the five methods, with no more than a single line of code.

Will the ensemble scenario show forecast qualities that none of the individual methods can provide?

image by author
image by author

1. Creating an Ensemble Forecast

Before we discuss the concept behind Ensemble Forecasts, let’s generate one— and then compare its results with the individual models to see what insight, if any, it can contribute.

We take the Jupyter notebook which we used in yesterday’s tutorial on multi-method forecasts and add an additional line to the dependencies cell: We import Darts’ RegressionEnsembleModel.

image by author
image by author

To concatenate the individual methods, which we list in the variable models,

image by author
image by author

we add a line of code that will process the ensemble forecast: The RegressionEnsembleModel takes

  • the models list as its first argument;
  • and the number of periods on which we want to train the ensemble.

The next three lines are the same as they were for the individual forecasters:

  • train the model
  • compute predictions
  • compute the residuals

We insert the RegressionEnsembleModel in an ensemble evaluation function, which has the same structure as the model evaluation function in yesterday’s multi-method notebook. In the evaluation function, we also compute the accuracy metrics and plot the forecast.

We call the evaluation function with the list of models we want it to tie together.

image by author
image by author

I modified the multi-method Jupyter notebook with respect to the accuracy metrics. Since we need to access the formulas of the metrics more than once – first for the individual methods, then for the ensemble – I wrapped them in a function we can call with one line of code, rather than repeating all their code lines in the script.

We call the metrics function, collect the accuracy metrics of the ensemble, and combine them in a dataframe with those of the individual methods.

The ensemble forecast returns the same kind of outputs as any of the individual methods: forecast values we can plot; residuals; and prediction accuracy metrics.

image by author
image by author

2. What does the Ensemble Contribute?

Does the ensemble provide insights which we cannot gain from the other methods?

image by author
image by author

We review the metrics to find out if the ensemble provided an improvement over the other methods.

Note that I have also added a column "avg", which is the simple average of the 5 other methods for each metric. The ensemble’s metrics, of course, do not equal the values in the average column, which would be pointless. The average would not contribute an improvement over all the individual methods because it sits between the best and worst scenarios.

The regression ensemble model, by contrast, can come up with predictions that are more accurate than those which even the best individual method can contribute. This improvement is not guaranteed. You see in the metrics table that the ensemble model leads the field for three metrics: in terms of RMSE, R-squared, and the standard error of the forecast, by small amounts compared to the Theta method. But Theta inches ahead by an equally small amount in the MAPE row.

RMSE, which squares the prediction errors, penalizes larger errors more than MAPE does. Bias arises when the distribution of residuals is left-skewed or right-skewed. The mean will lie above or below the median. A forecast that minimizes the RMSE will exhibit less bias. But sensitivity to outliers may not be preferred for source data with many outliers. In the literature and in comment sections, you can find heated discussions about the relative strengths and weaknesses of RMSE and MAPE, as well as the pros and cons of a multitude of other metrics. Thus, we cannot pass a summary judgment, once and for all, that either MAPE or RMSE is superior for deciding a horse race among models.

The ensemble’s R-squared is half a percent higher, so it explains an additional fraction of the movements in the actual observations.

The standard error of the forecast, se (aka the estimated standard deviation of the error in the forecast), is a tad smaller for the ensemble predictions. Mathematics of simple regression (duke.edu)

The vote is 3 to 1 in favor of the ensemble model, given our metrics. But this "vote" is not a hard criterion to declare the winner of the contest between the methods. For other time series, an ensemble forecast could certainly achieve superior forecast quality and display more pronounced differences in the indicators. As it cost us just three lines of code to create the ensemble, it is worthwhile to add the ensemble to the individual models when we run a method tournament on our source data and discover whether or not it can add improvements over the other methods.

Here, we just confirmed that the Theta method comes very close to the wisdom of the entire forecaster crowd, so close that the two are almost interchangeable.

3. Working with Ensemble Results

We can process the ensemble forecast values like we did with the results of any of the individual methods. The ensemble is just another Darts model, one we have created on the fly and one that is tailored to the concrete time series, rather than a theoretical method with a public name tag like Theta or ARIMA.

We can run _Darts’ plot_residualsanalysis() function on the ensemble.

image by author
image by author

We add the ensemble to the previously empty sixth subplot in the bottom-right corner.

image by author
image by author

Also the Ljung-Box test of statsmodels and the normality test of scipy can be applied to the residuals of the ensemble.

image by author
image by author

3. What Is the Conceptual Idea Behind Ensemble Forecasts?

The goal of ensemble forecasts is to get a model that is more robust and can be better generalized to new data points or other time series. The combined predictor has the purpose to reduce the standard error of the forecast.

The individual methods may exhibit some weaknesses in dealing with a concrete time series, in identifying and handling outliers, or processing shifts in its trend or seasonality. If we simply replace one method with a different one, we run the risk to stumble over some weakness in the second method at some point. By assembling several methods, the wisdom of the forecaster crowd can – in many cases, but not necessarily in all cases – iron out the weaknesses of single-method models.

The user selects the methods which Darts is to take as the building blocks for the ensemble. Then the RegressionEnsembleModel (if we don’t specify additional parameters) will run a linear regression model, with the forecasters as its regressors. The regression computes a linear combination of the selected forecasters that most closely aligns the predicted values with the actual observations. Thus, the regression model investigates to which extent each method should be integrated (weighted) in the ensemble in order to minimize the deviations from the actual observations. The regressors are the outcomes of complex forecast methods themselves rather than just flat source numbers. The regression target, the ensemble, forms a weighted average predictor.

The RegressionEnsembleModel accepts more complex ensembling functions than a simple linear regression, provided that these functions adhere to the scikit-learn pattern by implementing fit() and predict() methods.

The Jupyter notebook is available for download on GitHub: h3ik0th/Darts_ensembleFC: Python time series ensemble forecasts with Darts (github.com)


Related Articles