Hands-on Tutorials

In this three-part series, we explore a Python forecasting library that uses minimal code to examine time series and Forecast with popular and well-known machine learning models. Its strengths include:
- A dynamic Forecasting/test-set prediction process with autoregressive terms that guards against data leakage
- Many ways to account for seasonality
- Easily obtained level results and error metrics, whether or not the series was differenced
Part 1 uses this framework to forecast a single series. Part 2 scales the approach to over 100 series. Part 3 returns to one series, but showcases further flexibility to move between modeling on different integration levels. If this is a project you like and are interested in contributing to and/or maintaining, contact [email protected]. See it on GitHub:
First, to install:
pip install --upgrade scalecast
Now, to the code. Import the Forecaster object:
from scalecast.Forecaster import Forecaster
Next, import data and save it into the Forecaster object. We will be using the HOUSTNSA series, which measures monthly new housing starts in the U.S. since 1959. To load the time series data into the Forecaster object, which we call f
, use this code:
import matplotlib.pyplot as plt
import pandas_datareader as pdr
df = pdr.get_data_fred('HOUSTNSA',start='1900-01-01',end='2021-05-01')
f = Forecaster(y=df['HOUSTNSA'],current_dates=df.index)
We end the series in May 2021 to make the results reproducible. Let’s plot the time series.
f.plot()

An important part of forecasting is examining the time series’ statistical properties. How much do current values correlate with past values? What is the seasonal nature of the data? Is the data stationary? We examine these questions using an Autocorrelation Function (ACF) Plot, Partial-autocorrelation Function (PACF) Plot, and Seasonal Decomposition Plot. First, let’s look at ACF and PACF plots.
f.plot_acf()
f.plot_pacf()
plt.show()


These show that there is a high amount of correlation between present and past values, otherwise known as autocorrelation, up to 25 periods. The first plot is a simple examination of this correlation; the second plot controls for correlation between terms.
When such a pattern emerges, it is a strong indicator that the series is not stationary (its mean does not remain constant), which makes it difficult to forecast because it partly follows a "random walk" pattern, moving up and down seemingly at random. One way around this is to transform the data so that each value is the difference of the previous and current value in the series. Let’s view the ACF and PACF plots, this time using diffy=True
to pass differenced data to the functions.
f.plot_acf(diffy=True)
f.plot_pacf(diffy=True)
plt.show()


Although there is still a good deal of autocorrelation, it is not nearly as much. We also notice some seasonal spikes around 12 and 24 months. Let’s further examine the seasonality in the differenced data:
f.seasonal_decompose(diffy=True).plot()
plt.show()

We can see from this plot that there is monthly seasonality, so that January months in the past are similar to one another, same with February and so forth. The residuals also appear to be randomly distributed. That is all good information when considering how to preprocess models and what regressors to use, which we will discuss shortly.
It is important to hold out a chunk of the dataset to test the accuracy of the final forecasts. Let’s use 12 months to do so. We also want to forecast 24 months into the future.
f.set_test_length(12)
f.generate_future_dates(24)
You may wonder what calling these methods returns. Actually, they do not return anything, but simply save information within the object itself. I like this because we do not have to track several datasets or pandas dataframes; all the relevant information is stored in the object and can easily be accessed later.
Let’s decide what regressors to add to the Forecaster object. From the ACF plot, we can see there is significant correlation between present and future values back for several periods; let’s use 4 past values (otherwise known as lags or autoregressive terms) for now. We can also capture some seasonality with the 12th and 24th lags.
f.add_ar_terms(4) # 4 AR terms
f.add_AR_terms((2,12)) # 2 seasonal AR terms
In our object, these regressors are stored as ‘AR1’, ‘AR2’, ‘AR3’, ‘AR4’, ‘AR12’, and ‘AR24’ respectively. The beautiful part of this module is that all forecasting and prediction-making with AR terms use a dynamic process to fill in future values with current predictions. This guards against much overfitting and data leakage seen in other forecasting applications while allowing predictions to be dynamic. However, it also means some model evaluation is slow due to a liberal use of loops in the source code. Don’t let that discourage you!
Let’s now difference the series within the object to make it stationary. We also run a statistical test called Augmented Dickey Fuller to further examine stationarity, then replot the data.
>>> f.diff()
>>> f.adf_test(quiet=False)
>>> f.plot()
series appears to be stationary

The series seems less interpretable when plotted like this, but it will be easier to forecast effectively than using its original level. More importantly, it is now stationary according to the statistical test.
The diff()
method will difference all autoregressive terms (which it recognizes as the variables that begin with the "AR" label) in addition to the main series. That being said, let’s now add regressors that capture the effects of month and year.
f.add_seasonal_regressors('month',raw=False,sincos=True)
f.add_seasonal_regressors('year')
This method depends on thepandas.Series.dt
attribute and the pandas.Series.dt.isocalendar()
method. By default, a series of integer type will be returned using labels such as "month", "week", "day", "dayofweek" and more (see here). Since we do not necessarily see a direct linear relationship between the month of any given year and how high or low the values in our series are, we can elect not to use the integer vales for month (raw=False
) and instead convert the values to sine and cosine wave functions (sincos=True
), which offers a more cyclical pattern to model the data and should increase accuracy. Also available is dummy=True
which creates dummy variables out of the regressors. All of these transformations have their pros and cons and you are encouraged to try different methods in specifying your own models. So far, our regressors include the AR terms mentioned earlier and three new ones called "monthsin", "monthcos", and "year".
Let’s also add a COVID19 regressor, time trend, combination time trend with COVID19, and polynomial time trends of powers 2 and 3.
f.add_covid19_regressor() # called 'COVID19' by default
f.add_time_trend() # called 't' by default
f.add_combo_regressors('t','COVID19') # 't_COVID19'
f.add_poly_terms('t',pwr=3) # 't^2' and 't^3'
The default COVID19 regressor date range is from the day Walt Disney World (and many other prominent businesses) closed in the U.S. to when the CDC lifted its mask recommendation for vaccinated people, but these dates, as well as the name of most variables added in this manner, can be adjusted.
Now that all regressors have been added, we can model. Another useful part of the scalecast module is its ability to automatically tune a diverse set of models. The easiest way to do this is to create a Grids.py file in the working directory. There, we can specify model validation grids in the form of Python dictionaries. Each model will run on a validation set of data that is a length of time we choose that occurs right before the test set periods. All combinations written to the grids file will be tried for a given model and the combination with the best performance on the validation set will be recommended and we can use them with the auto_forecast()
method. In the Grids.py file, you can add the following contents:
Note, you need at least one grid for each model you want to tune.
We run the following code to specify a 6-period validation set and tune 8 different models:
We have now tuned and forecasted with a Multiple Linear Regression, K-nearest Neighbor, Support Vector Regression, Extreme Gradient Boosted Tree, Gradient Boosted Tree, Elasticnet, Multi-level Perceptron, and Prophet model. These all come from popular machine learning libraries: Scikit-Learn, XGBoost, and Facebook Prophet, so documentation on each is abundant.
The next bit of code combines these models to create two additional forecasts: a simple average of the top-3 performing forecasts according to the tuning process and a weighted average of all models where the weights are automatically assigned based on each model’s respective performance in the tuning process. You are encouraged to experiment with your own combination modeling, as there are many ways to do so.
Now we have a set of 10 highly optimized models forecasted on this time series. However, that does not mean they are all good. Some of them are bound to come out a little bit iffy – the point of trying so many is to get at least a couple that work. So, let’s plot the top-5 best ones, according to their respective root mean squared error (RMSE) performances on the test set to see how they look. Let’s also print to console some interesting information about each evaluated model (see here for all possible values to print):
f.plot(models='top_5',order_by='TestSetRMSE')

This is nice, and we can see that the prophet model was the best according to the error metric we selected, but what do these forecasts actually tell us about housing? Not anything obvious, since we differenced our data and units are in terms of housing changes month-to-month as opposed to total new houses. Getting these forecasts to reveal anything meaningful takes work. Or does it? In this module, it’s actually very easy. Let’s see this plot again, except now we want level forecasts so we pass level=True
. We also order them by their level test-set mean absolute percentage error (MAPE) values and print more good information to console:
f.plot(models='top_5',order_by='LevelTestSetMAPE',level=True)

Wow! Real, usable forecasts just like that, and considering how much analysis we got out of it, not a lot of code. Our most accurate model according to the level MAPE (K-nearest Neighbors) is only off by an average of about 8% per period on test-set data. The top-3 models were the same according to both error metrics we selected and all three look reasonable. Let’s now export the results to Excel (download).
f.export(
to_excel=True,
determine_best_by='LevelTestSetMAPE',
excel_name='housing_results.xlsx'
)
To conclude, we have an Excel workbook and plots with all forecasts evaluated at differenced and undifferenced levels, statistical information about each model, a specification of which is our best forecast, and a comparison of test-set predictions with actuals. There is minimal chance of overfitting because all our forecasting was dynamic and well-validated. There is much more you can do with this module. Be sure to read the official scalecast documentation and check out part 2!
Note: This blog post was originally written based on the functionality from one of the earliest scalecast versions. I have tried to update the code in this post, but some of it may not work with the latest version. See the latest introduction notebook to scalecast.