Time Series Forecasting

Let’s Forecast Your Time Series using Classical Approaches

A Gentle Introduction to 14 Classical Forecasting Techniques and their Implementation in Python

Ajay Tiwari
Towards Data Science
12 min readAug 13, 2020

--

Background

We learned various data preparation techniques and also set up a robust evaluation framework in my previous articles. Now, we are ready to explore different forecasting techniques.

With the availability of many machine learning models, often we forget the power of our classical algorithms. It is a good idea to start with classical approaches. Even though the classical approaches are focused on the linear relationship, they perform well on a wide range of problems assuming the data is suitably prepared.

Here is the list of techniques that are going to be discussed in the current article. We will also discuss their Python implementation.

1. Univariate Time Series Forecasting
1.1. Autoregression
1.2. Moving Average
1.3. Autoregressive Moving Average
1.4. Autoregressive Integrated Moving Average
1.5. Seasonal Autoregressive Integrated Moving Average
2. Multivariate Time Series Forecasting
2.1. Vector Auto-Regression
2.2. Vector Moving Average
2.3. Vector Auto Regression Moving Average
3. Time Series Forecasting with Exogenous Variables
3.1. SARIMA with Exogenous Variables
3.2. Vector Autoregression Moving-Average with Exogenous Regressors
4. Time Series Forecasting with Smoothing Techniques
4.1. Moving Average Smoothing
4.2. Single Exponential Smoothing
4.3. Double Exponential Smoothing
4.4. Triple Exponential Smoothing

Univariate Time Series Forecasting

These are datasets where only a single variable is observed at each time, such as temperature each hour. The univariate time series is modeled as a linear combination of its lags. That is, the past values of the series are used to forecast the current and future.

Autoregression (AR)

Autoregression models an output (value at the next step) based on the linear combination of input variables (values at prior time steps). For example, in linear regression y-hat is the prediction, β₀ and β₁ are coefficients calculated by the model on training data, and X is an input value.

Similarly, in time series we can predict the value at the next time step given the observations at current and previous time steps.

‘p’ is the auto-regressive trend parameter, the ideal value for p can be determined from an autocorrelation plot.

The method is suitable for time series without trend and seasonal components.

Python Implementation — AR

# Import libraries
from statsmodels.tsa.ar_model import AutoReg
from random import random
# Generate a sample dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = AutoReg(data, lags=1)
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

Moving Average (MA)

The difference between observed and predicted values is called the residual error. These errors from forecasts on a time series provide another source of information that we can model. It is calculated as:

residual error = observed — predicted

Therefore, the moving average method is also called the model of residual error, this method models the next step in the sequence as a linear function of the residual errors. You can observe this difference in the following equation.

‘q’ is the moving-average trend parameter, ideal value for q can be determined from the partial auto-correlation plot.

The method is suitable for time series without trend and seasonal components.

Python Implementation — MA

# Import libraries
from statsmodels.tsa.arima_model import ARMA
from random import random
# Generate a sample dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = ARMA(data, order=(0, 1))
model_fit = model.fit(disp=False)
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

Autoregressive Moving Average (ARMA)

The Autoregressive Moving Average (ARMA) method uses both the above information (original observations and residual errors) for forecasting, it as an advancement over individual AR and MA models.

Therefore, this method models the next step in the sequence as a linear function of the observations and residual errors at prior time steps.

Modelers have to specify both the parameters p and q for both components of the model, i.e., autoregressive (AR) and moving average (MA).

The method is suitable for time series without trend and seasonal components.

Python Implementation — ARMA

# Import libraries
from statsmodels.tsa.arima_model import ARMA
from random import random
# Generate a sample dataset
data = [random() for x in range(1, 100)]
# fit model
model = ARMA(data, order=(2, 1))
model_fit = model.fit(disp=False)
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

Autoregressive Integrated Moving Average (ARIMA)

The statistical models we have discussed so far assume the time series to be stationary, but in reality, most of the time series is not stationary, i.e the statistical properties of a series like mean, variance changes over time.

Therefore, we can add one more step as a pre-processing step, i.e., differencing (‘d’) the time series to make it stationary.

Now, we have a method that combines both Autoregression (AR) and Moving Average (MA) models as well as a differencing pre-processing step of the sequence to make the sequence stationary, called integration (I).

Therefore, we need to find out whether the time series we are dealing with is stationary or not. We can diagnose stationarity by looking at seasonality and trend in time series plots, checking the difference in mean and variance for various periods, and the Augmented Dickey-Fuller (ADF) test. You can find these techniques in detail in my previous article ‘Build Foundation for Time Series Forecasting’.

The method is suitable for time series with trend and without seasonal components.

Python Implementation — ARIMA

# Import libraries
from statsmodels.tsa.arima_model import ARIMA
from random import random
# Generate a sample dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = ARIMA(data, order=(1, 1, 1))
model_fit = model.fit(disp=False)
# make prediction
yhat = model_fit.predict(len(data), len(data), typ='levels')
print(yhat)

Seasonal Autoregressive Integrated Moving Average (SARIMA)

This method is an extension of the ARIMA model to deal with seasonal data. It models seasonal and non-seasonal components of the series separately.

There are four other seasonal parameters added to this approach in addition to three trend related parameters used in the ARIMA approach.

Non-seasonal parameters same as ARIMA

p: Autoregressive order
d: Differencing order
q: Moving average order

Seasonal parameters

P: Seasonal autoregressive order
D: Seasonal differencing order
Q: Seasonal moving average order
m: Number of time steps for a single seasonal period

The method is suitable for time series with trend and/or seasonal components.

Python Implementation — SARIMA

# Import libraries
from statsmodels.tsa.statespace.sarimax import SARIMAX
from random import random
# Generate a sample dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = SARIMAX(data, order=(1, 1, 1), seasonal_order=(1, 1, 1, 1))
model_fit = model.fit(disp=False)
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

Multivariate Time Series Forecasting

These are datasets where two or more variables are observed at each time. In multivariate time series, each variable is modeled as a linear combination of past values of itself and the past values of other variables in the system.

Vector Auto-Regression (VAR)

It is a generalized version of the autoregression model to forecast multiple parallel stationary time series. It comprises one equation per variable in the system. The right-hand side of each equation includes a constant and lags of all of the variables in the system.

There are two decisions we have to make when using a VAR to forecast, namely how many variables (K) and how many lags (p) should be included in the system.

The number of coefficients to be estimated in a VAR is equal to K+pK² (or 1+pK per equation). For example, for a VAR with K=5 variables and p=2 lags, there are 11 coefficients per equation, giving a total of 55 coefficients to be estimated. The more coefficients that need to be estimated, the larger the estimation error.

Therefore, it is advisable to keep K small and include only variables that are correlated with each other, and therefore useful in forecasting each other. Information criteria are commonly used to select the number of lags (p) to be included.

Python Implementation — VAR

# Import libraries
from statsmodels.tsa.vector_ar.var_model import VAR
from random import random
# Generate a sample dataset with correlated variables
data = list()
for i in range(100):
v1 = i + random()
v2 = v1 + random()
row = [v1, v2]
data.append(row)
# fit model
model = VAR(data)
model_fit = model.fit()
# make prediction
yhat = model_fit.forecast(model_fit.y, steps=1)
print(yhat)

The VAR can also be implemented using VARMAX function in Statsmodels which allows estimation of VAR, VMA, VARMA, and VARMAX models through the order argument.

Vector Moving Average (VMA)

It is a generalized version of the moving average model to forecast multiple parallel stationary time series.

Python Implementation — VMA

# Import libraries
from statsmodels.tsa.statespace.varmax import VARMAX
from random import random
# Generate a sample dataset with correlated variables
data = list()
for i in range(100):
v1 = i+ random()
v2 = v1 + random()
row = [v1, v2]
data.append(row)
# fit VMA model by setting the ‘p’ parameter as 0.
model = VARMAX(data, order=(0, 1))
model_fit = model.fit(disp=False)
# make prediction
yhat = model_fit.forecast()
print(yhat)

Vector Auto Regression Moving Average (VARMA)

It is the combination of VAR and VMA and a generalized version of the ARMA model to forecast multiple parallel stationary time series.

This method requires ‘p’ and ‘q’ parameters and is also capable of acting like a VAR model by setting the ‘q’ parameter as 0 and as a VMA model by setting the ‘p’ parameter as 0.

Python Implementation — VARMA

# Import libraries
from statsmodels.tsa.statespace.varmax import VARMAX
from random import random
# Generate a sample dataset with correlated variables
data = list()
for i in range(100):
v1 = random()
v2 = v1 + random()
row = [v1, v2]
data.append(row)
# fit model
model = VARMAX(data, order=(1, 1))
model_fit = model.fit(disp=False)
# make prediction
yhat = model_fit.forecast()
print(yhat)

Time Series Forecasting with Exogenous Variables

SARIMA with Exogenous Variables (SARIMAX)

The Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX) is an extension of the SARIMA model that also includes the modeling of exogenous variables.

Before moving ahead let’s understand endogenous and exogenous variables.

An exogenous variable is one whose value is determined outside the model and is imposed on the model. Here, X is an exogenous variable

An endogenous variable is a variable whose value is determined by the model. Here, main series to be forecasted is an endogenous variable.

In time series, the exogenous variable is a parallel time series that are not modeled directly but is used as a weighted input to the model.

The method is suitable for univariate time series with trend and/or seasonal components and exogenous variables.

Python Implementation — SARIMAX

# Import libraries
from statsmodels.tsa.statespace.sarimax import SARIMAX
from random import random
# Generate a sample dataset with independent exogenous variable
data1 = [x + random() for x in range(1, 100)]
data2 = [x + random() for x in range(101, 200)]
# fit model
model = SARIMAX(data1, exog=data2, order=(1, 1, 1), seasonal_order=(0, 0, 0, 0))
model_fit = model.fit(disp=False)
# make prediction
exog2 = [200 + random()]
yhat = model_fit.predict(len(data1), len(data1), exog=[exog2])
print(yhat)

The SARIMAX method can also be used to model the other variations with exogenous variables, such as ARX, MAX, ARMAX, and ARIMAX by including an exogenous variable.

Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX)

This method is an extension of the VARMA model that also includes the modeling of exogenous variables. It is a multivariate version of the ARMAX method.

The method is suitable for multivariate time series without trend and seasonal components with exogenous variables.

Python Implementation — VARMAX

# Import libraries
from statsmodels.tsa.statespace.varmax import VARMAX
from random import random
# Generate a sample dataset with correlated multiple time series and an independent exogenous variable
data = list()
for i in range(100):
v1 = random()
v2 = v1 + random()
row = [v1, v2]
data.append(row)
data_exog = [x + random() for x in range(100)]
# fit model
model = VARMAX(data, exog=data_exog, order=(1, 1))
model_fit = model.fit(disp=False)
# make prediction
data_exog2 = [[100]]
yhat = model_fit.forecast(exog=data_exog2)
print(yhat)

Time Series Forecasting with Moving Average Smoothing

Moving average smoothing is a simple and effective technique in time series forecasting. The same name but very different from the moving average model which we discussed in the beginning. The earlier version of the moving average (MA) is a model of residual errors, whereas this smoothing technique consists of averaging values across a window of consecutive periods.

In general, there are two types of moving averages are used:

Centered Moving Average

The value at the time (t) is calculated as the average value of current, before and after the time (t). For example, a centered moving average with window width 3:

centered_ma(t) = mean(obs(t+1), obs(t), obs(t-1))

Centered moving average requires the availability of future values, often we find this method impractical to use for forecasting.

Trailing Moving Average

The value at the time (t) is calculated as the average value of current and before the time (t). For example, a trailing moving average with window width 3:

trail_ma(t) = mean(obs(t), obs(t-1), obs(t-2))

Trailing moving averages uses only current and historical observations to predict the future.

This method mainly used in feature engineering, it can also be used in making predictions. We can use the newly created series of ‘moving average values’ to forecast the next step using a naive model. Before forecasting, we assume that the trend and seasonality components of the time series have already been removed or adjusted for.

There is no python function available for this approach, instead, we can create a custom function, you can refer naive model implementation in my previous article.

Time Series Forecasting with Exponential Smoothing

Exponential smoothing is a time series forecasting method for univariate data. It is a close alternative to the popular Box-Jenkins ARIMA class of methods.

Both approaches predict a weighted sum of past observations, here is the important difference to be noted.

ARIMA family develops a model where the prediction is a weighted linear sum of recent past observations or lags, whereas exponential smoothing explicitly uses an exponentially decreasing weight for past observations.

Single Exponential Smoothing

This method is also called Simple Exponential Smoothing, suitable for forecasting data with no clear trend or seasonal pattern. Mathematically,

where, alpha, a smoothing parameter falls between 0 and 1. A large value means the model pays importance to the most recent observation, whereas smaller value means the oldest observations are more significant.

Python Implementation — SES

# Import libraries
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
from random import random
# Generate a sample dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = SimpleExpSmoothing(data)
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

Double Exponential Smoothing

Double exponential smoothing is an extension to the above approach (SES), this method allows the forecasting of data with a trend. Mathematically,

In addition to the alpha, a smoothing factor for the level, an additional smoothing factor is added to control the decay of the influence of the change in a trend called beta.

These methods tend to over-forecast, especially for longer forecast horizons. Motivated by this observation, Gardner & McKenzie (1985) introduced a parameter that “dampens” the trend to a flat line sometime in the future.

Therefore, in conjunction with the smoothing parameters α and β (with values between 0 and 1), this method also includes a damping parameter ϕ which also ranges between 0 and 1.

Python Implementation — Double Exponential Smoothing

# Import libraries
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from random import random
# Generate a sample dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = ExponentialSmoothing(data,trend='add', seasonal=None, damped=True)
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

Triple Exponential Smoothing

Triple Exponential Smoothing is the most advanced variation in the family, this method is also known as Holt-Winters Exponential Smoothing, named for two contributors to the method: Charles Holt and Peter Winters.

In addition to the alpha and beta smoothing factors, a new parameter is added called gamma (g) that controls the influence on the seasonal component.

Damping is possible with both additive and multiplicative Holt-Winters’ methods. A method that often provides accurate and robust forecasts for seasonal data.

Python Implementation — Holt-Winters Exponential Smoothing

# Import libraries
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from random import random
# Generate a sample dataset
data = [x + random() for x in range(1, 100)]
# fit model
model = ExponentialSmoothing(data,trend="add", seasonal="add", seasonal_periods=12, damped=True)
model_fit = model.fit()
# make prediction
yhat = model_fit.predict(len(data), len(data))
print(yhat)

Summary

In this article, you discovered all the important classical forecasting techniques that can help you in getting started with your forecasting problems. You can deep dive into the suitable techniques and tune it further as per your need. To build a robust forecasting model, you must start by performing the right data transformation, set up an evaluation strategy, and have ready with a persistence (benchmark) model.

Thanks for reading! Please feel free to share any comments or feedback.

Hope you found this article informative, in the next few articles I will discuss some of the commonly used techniques in detail.

References

[1] Galit Shmueli and Kenneth Lichtendahl, Practical Time Series Forecasting with R: A Hands-On Guide, 2016.

[2] Jason Brownlee, https://machinelearningmastery.com/

[3]https://scikitlearn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html

[4] https://www.statsmodels.org/stable/user-guide.html#time-series-analysis

--

--