
A popular classical time series forecasting technique is called Vector Autoregression (VAR). The idea behind this method is that the past values (lags) of multiple series can be used to predict the future values of others in a linear fashion. It forecasts multiple time series together this way.
When would you want to use such a method? When you have two or more series that you suspect influence one another, such as interest rates and inflation, this is a valid approach. It is very important when running VAR that all series are stationary. Once you have identified the series you want to forecast and ensured their stationarity (by taking differences of or otherwise transforming non-stationary series), the only parameter left to consider is the number of lags to use as predictors. This can be done by an information criterion search on different numbers of lags. There are extensions to the VAR method that include estimating with error terms (VARMA), applying error-correction terms (VECM), adding exogenous variables (VARMAX), and estimating with seasonality (SVAR).
Extending VAR Concepts to Machine Learning
The logical question is, what if you changed the underlying linear function in this process? Or, what if you mixed and matched several of the concepts described above? There are ways to use this general approach but with a more machine-learning based procedure, such as with models available in the Scikit-Learn library. Considering how time-series data should be prepared and the difficulties of updating predictions and model inputs, writing such a procedure from scratch would be time-consuming. Thankfully, some Python packages, like darts, scalecast, and others, take a lot of the headache out of it for you.
Today, I will demonstrate how to apply this approach to forecasting with scalecast. Setting up the process and extracting final results are easy. The package dynamically forecasts all series you feed to it in a multistep process. A drawback is that there is not a lot of research around applying Machine Learning models for multivariate forecasting in this way, at least that I am aware of. But it’s fun to explore, nonetheless. See the full notebook here. The data is available on Kaggle with an Open Database license. Give scalecast a star on GitHub if you find it interesting:
GitHub – mikekeith52/scalecast: Forecast time series easily and dynamically at scale with many…
Exploratory Data Analysis
We will be looking at how to forecast the following two series:

These measure weekly conventional and organic avocado sales in California from January, 2015 through the end of March, 2018. The graph requires a dual axis due to Conventional having such a higher volume than Organic. Running a Pearson correlation calculation on them, we see that their coefficient is 0.48. The two series definitely move together and exhibit similar trends, albeit on different scales.
Next, we check the stationarity in both series. Using a common test to determine this, the Augmented Dickey-Fuller test, we see that both series can be considered stationary with 95% certainty. This means we can try modeling them at their original levels. However, both look as though they are following a trend (and therefore are not stationary) and neither test confirmed stationarity at the 99% significance level. We can assume they are stationary for now, but an interesting extension to this analysis would be to difference each series once, which scalecast allows you to do easily.

Let’s use the autocorrelation and partial autocorrelation functions to see how many lags are statistically significant from each series:

From these plots, it is hard to tell exactly how many lags would be ideal to forecast with. It appears to be at least three, but possibly up to 20, with some gaps. Let’s err on the side of adding fewer lags— 3. Later, we show how to incorporate this decision in the code.
Another important component to consider is each series’ seasonality. We can see seasonality visually by using a seasonal decomposition method:


Looking at this output, there does appear to be an upward trend in at least the Organic series, in spite of the Augmented Dickey-Fuller test indicating stationarity. There is strong seasonality as well, looking as though there are annual (52 periods) and semi-annual (26 periods) cycles. The above is a linear decomposition in the data, but the residuals look like they still follow a pattern, indicating a linear model might not be most suited. We can try using linear and non-linear approaches and adding different types of seasonality.
Finally, let’s use everything we have learned about these series to make our modeling decisions. First, the models we will apply:
models = ('mlr','elasticnet','knn','rf','gbt','xgboost','mlp')
The MLR and ElasticNet models are both linear applications, the ElasticNet being a linear model with a mix of L1 and L2 regularization parameters. The other models are all non-linear and include k-nearest neighbors, random forest, two boosted trees, and a multi-level perceptron neural network.
Using the scalecast process, we can now create Forecaster objects to store information about each series and the way we want to try to forecast them:
# load the conventional series
fcon = Forecaster(y=data_cali_con['Total Volume'],
current_dates = data_cali_con['Date'])
# load the organic series
forg = Forecaster(y=data_cali_org['Total Volume'],
current_dates = data_cali_org['Date'])
for f in (fcon,forg):
# set forecast horizon of 1 year
f.generate_future_dates(52)
# set 20% testing length
f.set_test_length(.2)
# set aside 4 weeks for validation
f.set_validation_length(4)
# add seasonality in the form of wave functions
f.add_seasonal_regressors(
'week',
'month',
'quarter',
raw=False,
sincos=True,
)
# add the year as a regressor
f.add_seasonal_regressors('year')
# add a time trend
f.add_time_trend()
# add an irregular seasonal cycle of 26 periods
f.add_cycle(26)
# add three dep variable lags
f.add_ar_terms(3)
Univariate Forecasting
Before we extend the analysis into multi-series forecasting, let’s benchmark performance by applying a univariate process. By importing validation grids from scalecast that are saved to the working directory as Grids.py (for univariate processes) and MVGrids.py (for multivariate), we can automatically tune, validate, and forecast with our selected models using the regressors (including the lags, seasonal regressors, and time trends we have already added) automatically:
GridGenerator.get_example_grids(overwrite=False)
GridGenerator.get_mv_grids(overwrite=False)
We can then call the forecasting process:
fcon.tune_test_forecast(models,feature_importance=True)
forg.tune_test_forecast(models,feature_importance=True)
We can also add a weighted-average ensemble model to each object:
fcon.set_estimator('combo')
fcon.manual_forecast(how='weighted')
forg.set_estimator('combo')
forg.manual_forecast(how='weighted')
And plot the results:
fcon.plot_test_set(ci=True,order_by='LevelTestSetMAPE')
plt.title('Conventional Univariate Test-set Results',size=16)
plt.show()
forg.plot_test_set(ci=True,order_by='LevelTestSetMAPE')
plt.title('Organic Univariate Test-set Results',size=16)
plt.show()


Interesting patterns and predictions emerge. Let’s also export some model summaries to know numerically how each model did:
pd.set_option('display.float_format', '{:.4f}'.format)
ms = export_model_summaries({'Conventional':fcon,'Organic':forg},
determine_best_by='LevelTestSetMAPE')
ms[
[
'ModelNickname',
'Series',
'Integration',
'LevelTestSetMAPE',
'LevelTestSetR2',
'InSampleMAPE',
'InSampleR2',
'best_model'
]
]

Going from the test-set MAPE only, the XGBoost performed the best for the Conventional series and KNN for the Organic, both non-linear models. However, the XGBoost appears to be severely overfit. Let’s move to multivariate modeling to see if we can improve the results.
Multivariate Forecasting
To extend the univariate modeling above into a multivariate concept, we need to pass the created Forecaster objects from scalecast into an MVForecaster object. This is done as such:
mvf = MVForecaster(fcon,forg,names=['Conventional','Organic'])
For more depth about what this line of code does, see here. Basically, any number of Forecaster objects can be passed to this new object. You should already have set the forecast horizon and added any Xvars you want to use before building this new object, otherwise, you will only have the lags of each series to forecast with, and the chance to add seasonality and exogenous regressors will be lost. With this new MVForecaster object, it was able to get those other parameters from the two Forecaster objects we fed to it, but we do need to re-set test and validation lengths:
mvf.set_test_length(.2)
mvf.set_validation_length(4)
We have more than one series to forecast with, but we can still use an automated approach similar to what we did in the univariate section. However, now our models will try to optimize on two things, not only the selected error metric (which is RMSE by default when tuning models), but an aggregation of the error metric over the multiple series. For instance, if forecasting the Conventional series accurately were more important than getting the other series right for whatever reason, we could tell the object to only optimize the selected error metric on that one series:
mvf.set_optimize_on('Conventional')
To change it to optimize on the mean metric across series (which is also default behavior), we can use:
mvf.set_optimize_on('mean')
Now, we run the automated forecasting procedure:
mvf.tune_test_forecast(models)
After this completes, we tell the object to set the best model based on a metric we choose. I select the test-set MAPE. This will choose the model with the best test-set MAPE on average across both series:
mvf.set_best_model(determine_best_by='LevelTestSetMAPE')
Now, let’s see the results. We could plot all models on all series together, but since Conventional and Organic volumes are on massively different scales, we still want to plot them one-by-one, like we did in the univariate section:
mvf.plot_test_set(series='Conventional',
put_best_on_top=True,
ci=True)
plt.title('Conventional Multivariate Test-set Results',size=16)
plt.show()
mvf.plot_test_set(series='Organic',
put_best_on_top=True,
ci=True)
plt.title('Organice Multivariate Test-set Results',size=16)
plt.show()


Very nice! Before examining these further, let’s explore another type of ensemble model that can be performed with multivariate forecasting in scalecast.
Model Stacking
In the univariate section, we applied an ensemble model that is native to scalecast – the weighted-average model. Multivariate forecasting only allows Scikit-learn models to be applied, so we don’t have that same combination model available, but there is a different ensemble model that can be used: the StackingRegressor.
from sklearn.ensemble import StackingRegressor
mvf.add_sklearn_estimator(StackingRegressor,'stacking')
Now, we build the model using other models we have previously applied and tuned, like so:
It may look complicated, but this is combining our previously defined MLR, ElasticNet, and MLP models into one, where the predictions from each of these become the inputs for a final model, a KNN regressor. I chose these models because they showed the least signs of overfitting with solid test-set error metrics. Let’s call that model now in scalecast. We can use 13 lags to train this model because, as we will see soon, 13 lags was most commonly chosen in the model tuning process, although we could tune that parameter with a grid search if we wanted.
mvf.set_estimator('stacking')
mvf.manual_forecast(estimators=estimators,final_estimator=final_estimator,lags=13)
Now, we can see model performance for all models:
mvf.set_best_model(determine_best_by='LevelTestSetMAPE')
results2 = mvf.export_model_summaries()
results2[
[
'ModelNickname',
'Series',
'HyperParams',
'LevelTestSetMAPE',
'LevelTestSetR2',
'InSampleMAPE',
'InSampleR2',
'Lags',
'best_model'
]
]

The ElasticNet had the best average error between both series, which is why it is labeled the best for both series in the output above, but the KNN did better than the ElasticNet for the Organic series. We can choose the ElasticNet as the final model to implement for the Conventional series and the KNN for the Organic. Both of these best models’ MAPE metrics are lower than best models from the univariate approach, indicating better overall performance. Let’s see how they look plotted into the 52-period forecast horizon:
mvf.plot(series='Conventional',models='elasticnet',ci=True)
plt.title('Elasticnet Forecast - Conventional - MAPE 0.1168',
size=16)
plt.show()
mvf.plot(series='Organic',models='knn',ci=True)
plt.title('KNN Forecast - Organic - MAPE 0.0889',
size=16)
plt.show()


What’s important with these plots is to consider if they look believable. To me, if a forecast passes the eye test, that is the best indication that it is usable in the real world. Both of these models look okay as far as I can tell, although, neither can predict the overall spikiness of the series well. Maybe with more data or a more sophisticated modeling procedure, that irregular trend could be modeled better, but for now, this is what we will stick with.
Conclusion
This was an overview of multivariate forecasting in Python using scalecast. The modeling process is very simple and automated, which is good for accessing results quickly, but there are caveats to such an approach. By applying many models, it is possible to get lucky with some techniques and essentially overfit on the validation data. I echo the word of caution from the developers of the darts package:
"So [which of the applied models is best]? Well, at this point it’s actually hard to say exactly which one is best. Our time series is small, and our validation set is even smaller. In such cases, it’s very easy to overfit the whole forecasting exercise to such a small validation set. That’s especially true if the number of available models and their degrees of freedom is high (such as for deep learning models), or if we played with many models on a single test set (as done in this notebook).
With that sentiment, it may also be good to see the average errors from each technique and keep in mind that several of our models showed signs of overfitting.


With this view of the analysis, it is not obvious that either technique was better than the other, as some error metrics decreased with some models and increased with others. Whatever decision we make needs to be tempered by common sense. The decisions one person makes in such an analysis might be different than the decisions of another, and both could be valid. At the end of the day, forecasting is the science of telling the future, and no one is going to get the future 100% right. To finish out the thought from the darts developers:
"As data scientists, it is our responsibility to understand the extent to which our models can be trusted. So always take results with a grain of salt, especially on small datasets, and apply the scientific method before making any kind of forecast 🙂 Happy modeling!"
Thank you for following along!