
Time series forecasting can be a simple exercise if you understand the basics, much like snowboarding. There are many time series forecasting examples out there, however, I learn more if the example interests me. So, I have decided to create my own example about one of my favourite sports – snowboarding!
This article is structured as follows:
- Looking at the different types of data you can have.
- Understanding the features of time series data.
- Exploring the types of models you can and cannot apply to your data.
- Snowboarding use case – feel free to use your own data set.
Getting the basics right
What kind of data are you working with?
Cross sectional data
Data with no time dimension – this could be a number of characteristics and measures at one point in time e.g., age, gender, average income, schooling … all at one point in time.
Time series data
Data that has a time dimension – one observation over an extended period e.g., The average household income (HHI) over the past 5 years. If you wanted to forecast HHI for the next year, this will be called univariate time series forecasting, because you are forecasting a single variable.
*We will be focusing on univariate time series forecasting in the use case.
Pooled data
This is a combination of cross section and time series data e.g., Observing sales over time, together with customer satisfaction rates and real disposable income (RDI). If you wanted to forecast sales for the next year, together with exogenous factors (e.g., customer satisfaction and RDI), then this is called multivariate time series forecasting, because you are forecasting multiple variables over time.
Time series forecasting can be done simply by observing historical patterns, or one can use a machine learning model (e.g., neural nets or random forest regressions etc.). However, historical patterns will not always be a good indicator of future behaviour and machine learning models are sometimes overcomplicating the analysis and one needs to consider how you are going to explain these models to stakeholders.
Understanding the features of your time series data

Trend:
This is the general direction of your observation over an extended period. Is it upward, downward or horizontal (often know as stationary)?

Seasonality:
If there is a pattern that repeats at equal time intervals, then your data contains seasonality:

Irregular variation:
In addition to seasonality and trend, there may be spikes in your data e.g., holiday spikes or any change in trend:

Which forecasting model to use ?
The following guideline recommends the model to use according to the features of your data (Wakamiya, C (2020)):
No seasonality + Weak Trend:
● Simple Exponential Smoothing
● ARIMA
Seasonality and Trend:
You get additive and multiplicative seasonality:

According to the work done by [Pegels, C.C (1969)](http://Pegels, C.C. (1969) Exponential Forecasting: Some New Variations. Management Science, 15, 311-315. http://dx.doi.org/10.1287/mnsc.15.5.311), there are a number of time series patterns when forecasting:

● Triple Exponential Smoothing (Holt Winters) – where you can define the type of seasonality for both additive and multiplicative.
● SARIMA – is the seasonal extension of the ARIMA model.
Complex Models:
This will be discussed at the end.
Use Case
In this example we will:
- Understand the type of data you have.
- Know which Forecasting Models is best suited for your data and why others are not.
- Know how to evaluate the model you have selected.
- How to take your model further.
The Data
According to a recent report by 360 Research Reports:
"The global Snowboard Equipment market was valued at 503.48 Million USD in 2020 and will grow with a CAGR of 6.71% from 2020 to 2027"— 360 Research Reports (2021)
I created a fictitious dataset that you can download here. It shows the number of snowboards sold from 2009–01 to 2020–12.
#Import packages
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plt.rcParams.update({'figure.figsize':(20,18), 'figure.dpi':500})
from numpy import log
import warnings
import itertools
import numpy as np
plt.style.use('fivethirtyeight')
#Load dataset
data = pd.read_excel('Snowboard_Data.xlsx',
index_col ='Date',
parse_dates = True)
# Print the 1st 5 rows of the dataset
data.head()

#We need to ensure the date is a date type and is indexed
data.info()

Let’s plot the data to see what it looks like:
#Plot data
#import matplotlib.pyplot as plt
#import seaborn as sns
#import pandas as pd
sns.set(font_scale=2)
plt.figure(figsize = (12,10))
#choose a grid style out of darkgrid, whitegrid, dark, white, ticks
sns.set_style("ticks", {"axes.facecolor": "white"})
sns.lineplot(x = "Date", y = "Quantity", data = data, color='blue', linewidth=3.5)
plt.show()

Reviewing the line plot
We not only see a linear upwards trend, but we see that the amplitude of the cycles is increasing at regular intervals. This indicates multiplicative seasonality in our data.
ETS Decomposition
We will now plot the Error, Trend and Seasonality on separate graphs using ETS Decomposition:
# ETS Decomposition --> our data is multiplicative
result = seasonal_decompose(data['Quantity'],
model ='multiplicative')
# ETS plot
result.plot()

The first graph simply plots the original data. The 3 plots below show the Trend, Seasonality and Error (Residuals) components separately. These 3 plots combined gives you the original data in the first plot. What we see is:
- The trend is increasing over time and there is seasonality (spikes) at regular intervals.
- The residuals show periods of high variability in earlier and later years.
We have a strong upwards trend and seasonality present— which we confirmed is multiplicative.
Modelling multiplicative seasonality + trend
What if we ignored the seasonality or we were not interested in it? Then one could use Simple Moving Average (SMA)/ Simple Exponential Smoothing. However, this tends to over smooth sharp increases or decreases in the data and assumes that the seasonality repeats year after year (recall this is not always true).
SMA is the most basic forecasting technique. You take the average of the last N values and this average is then the forecasted value for the following period. Let’s apply SMA to our data to see what happens if we ignore the seasonality (recall SMA is used if you have no seasonality and a weak trend). Note how the irregularities are smoothed out.
# Take the moving average of last 6 observations
rolling = data.rolling(window=6)
rolling_mean = rolling.mean()
# plot the two series (original data and the rolling average)
plt.plot(data)
plt.plot(rolling_mean, color='red')
plt.show()

But we already know that SMA is not designed for our data and so we turn to Exponential Moving Average (EMA)/ Exponential Smoothing (ES). EMA applies exponentially decreasing weights to older observations, since older data is of less importance. We are more interested in more recent observations and so a higher weight is assigned to more relevant data. This removes some of the noise in the data and improves the forecast. This method is only reliable if used for short term forecasts.
3 Types of ES Forecasting Methods
1. Single (Simple) Exponential Smoothing (SES)
SES uses a weighted moving average with exponentially decreasing weights – this is simple because there is no specified structure. This does not handle trends or seasonality and is applied to univariate data.
2. Double Exponential Smoothing
Holt’s trend-corrected double exponential smoothing is an extension to SES. This does handle trends (additive and multiplicative) and is applied to univariate data i.e.
- Additive Trend: Double Exponential Smoothing with a linear trend.
- Multiplicative Trend: Double Exponential Smoothing with an exponential trend.
3. Triple Exponential Smoothing
Triple Exponential Smoothing/Multiplicative Holt-Winters/Holt-Winters Exponential Smoothing – is an extension of Double Exponential Smoothing that, and you guessed it, handles trends and seasonality, for a univariate time series.
- Additive Seasonality: Triple Exponential Smoothing with a linear seasonality.
- Multiplicative Seasonality: Triple Exponential Smoothing with an exponential seasonality.
Now we will apply Double and Triple ES to our data.
Double and Triple ES
#import numpy as np
#import pandas as pd
#import matplotlib.pyplot as plt
#from statsmodels.tsa.api import ExponentialSmoothing
# Split data into train / test sets
train = data.iloc[:len(data)-12]
test = data.iloc[len(data)-12:]
# Set one year(12 months) for testing
fit1 = ExponentialSmoothing(np.asarray(train['Quantity']) ,seasonal_periods=12 ,trend='multiplicative', seasonal='multiplicative',).fit()
# Create class
# Apply the box-cox power transformation - this that takes the original (non-normally distributed) data as input and returns the fitted data (normally distributed)
# See Source: (GeekksforGeeks , 2021) - https://www.geeksforgeeks.org/box-cox-transformation-using-Python/
EMA_fit = ExponentialSmoothing(train['Quantity'], seasonal_periods=12, trend='multiplicative', seasonal='multiplicative').fit(use_boxcox=True)
# fit model and make prediciton
fcast = EMA_fit.forecast(len(test))#or can be len(12)
# if the series was monthly data and the seasonal period repeated each year, then the Period=12.
y_hat_avg = test.copy()
y_hat_avg['Holt_Winter'] = fit1.forecast(len(test))
#holt winter is ES and can be additive or multiplicative and we know we have multiplicative seasonality
ax = train.plot(figsize=(10,6), marker='o', color='black', title="Forecasts from Exponential Smoothing" )
ax.set_ylabel("Quantity Snowboards Sold")
ax.set_xlabel("Index")
EMA_fit.forecast(12).rename('Train').plot(ax=ax, marker='o', color='blue', legend=True)
y_hat_avg.Holt_Winter.rename('Holt_Winter - Test').plot(ax=ax, style='--', marker='o', color='red', legend=True)

forecast = fcast
forecast = pd.DataFrame(forecast)
forecast
forecast_es = forecast.rename(columns={0: "ES_Forecast"})
forecast_es

# Load specific evaluation tools
# We will now calculate RMSE to check to accuracy of our model
from sklearn.metrics import mean_squared_error
from statsmodels.tools.eval_measures import rmse
from math import sqrt
rms = sqrt(mean_squared_error( forecast_es.ES_Forecast, test.Quantity))
print(rms)
#The value we get is: 34.69362278646293
What does this 34.69 mean? This is the Root Mean Squared Error (RMSE), which is one of the ways in which to measure the performance of a forecasting model.
We saw why SMA doesn’t work with a time series containing a trend and seasonality. For completeness, let’s look at why ARIMA would not work.
Why ARIMA fails
ARIMA does not support seasonality. We see this by the fact that differencing our data does not get rid of the increasing amplitude. Even after differencing the data twice, the series is still non-stationary. This means there is strong seasonality in our data that we cannot ignore.
#Original Series
fig, axes = plt.subplots(3, 2, sharex=False)
axes[0, 0].plot(data.Quantity); axes[0, 0].set_title('Original Series')
plot_acf(data.Quantity, ax=axes[0, 1]) #autocorrelation plot
#1st Differencing
axes[1, 0].plot(data.Quantity.diff()); axes[1, 0].set_title('1st Order Differencing')
plot_acf(data.Quantity.diff().dropna(), ax=axes[1, 1]) #autocorrelation plot
#2nd Differencing
axes[2, 0].plot(data.Quantity.diff().diff()); axes[2, 0].set_title('2nd Order Differencing')
plot_acf(data.Quantity.diff().diff().dropna(), ax=axes[2, 1]) #autocorrelation plot
plt.show()

SARIMA/SARIMAX

SARIMA makes use of seasonal differencing.
We will applying the _autoarima() function to our data. This function has the option of using a stepwise approach to search multiple combinations of p,d,q parameters used in the model. To identify the best combination of p,d and q, the model with the lowest AIC is selected.
Just to clarify what Akaike information criterion (AIC) means:
We are making use of a statistical model (SARIMA) to represent the process that generated our snowboard sales data. Because this is only a model and not the true process, information is lost. The lower the AIC, the less information is lost by the model, the better our model is at representing the process. AIC also takes into account the simplicity of the model to ensure the model is not overfitting or underfitting the data.
Whenever the stepwise approach finds a model with a lower AIC, this model then becomes the new best model, until the process cannot find a model close to the current best model.
Below is the code for the stepwise results:
# Import library
from pmdarima import auto_arima
# Supress warnings that are not important
import warnings
warnings.filterwarnings("ignore")
# Apply auto_arima to data
sw_fit = auto_arima(data['Quantity'], start_p = 1, start_q = 1,
max_p = 3, max_q = 3, m = 12,
start_P = 0, seasonal = True,
d = None, D = 1, trace = True,
error_action ='ignore',
suppress_warnings = True,
stepwise = True)#We are using the stepwise approach
# Summarise the stepwise results
sw_fit.summary()

The best model is the SARIMAX(0,1,1)x(2,1,1,12). We will now split our data into a training and test set and apply this model to the training set:
# Split data into train and test sets
train = data.iloc[:len(data)-12]
test = data.iloc[len(data)-12:] # set one year(12 months) for testing
# Fit a SARIMAX(0, 1, 1)x(2, 1, 1, 12) on the training set
from statsmodels.tsa.statespace.sarimax import SARIMAX
model = SARIMAX(train['Quantity'],
order = (0, 1, 1),
seasonal_order =(2, 1, 1, 12))
result = model.fit()
result.summary()

Now we will compare our model’s one-year predictions against our test set:
start = len(train)
end = len(train) + len(test) - 1
# Predictions for one-year against the test set
pred = result.predict(start, end,
typ = 'levels').rename("Predictions")
# plot predictions and actual values
pred.plot(figsize = (12, 5), legend = True)
Quantity =test['Quantity']
Quantity.plot(legend = True)

# Load specific evaluation tools
from sklearn.metrics import mean_squared_error
from statsmodels.tools.eval_measures import rmse
# Calculate root mean squared error
#RMSE is always greater than 0, where 0 indicates that our model has a perfect fit(theoretically possible, but hardly achieved in practice)
rmse(test["Quantity"], pred)
#The value we get: 25.715869516322755
#This is a much lower RMSE as our previous model
We now fit our model on the full dataset:
# Train the model on the full dataset
model = SARIMAX(data['Quantity'],
order = (0, 1, 1),
seasonal_order =(2, 1, 1, 12))
result = model.fit()
# Forecast for the next 2 years
fcast_sarima = result.predict(start = len(data),
end = (len(data)-1) + 3 * 12,
typ = 'levels').rename('Forecast')
# Plot the forecast values
data['Quantity'].plot(figsize = (12, 5), legend = True)
fcast_sarima.plot(legend = True)

This forecast is in line with the findings in the 360 Research Report.
Let’s look at some diagnostics:
sns.set(font_scale=1)
result.plot_diagnostics(figsize=(13 , 10))
plt.show()

A quick analysis of our diagnostic graphs:
- Standardised Residuals (top left): This shows the residuals over time. There must be no pattern here and the more inconsistent the residuals, the better. We have identified the seasonality and the trend. If there was a pattern here, then it would mean we haven’t identified the trend or seasonality.
- KDE Plot (top-right): If the orange density line (KDE) closely follows the green line, then the distribution of the residuals is normally distributed.
- Normal qq-plot (bottom left): This is a visual check to see if our dependent variable is normally distributed. If the blue dots closely follow the red line, then our dependent variable follows a Normal distribution. This is an implicit assumption we made but is required for our statistical analysis to be valid.
- Correlogram (bottom right): If majority of the blue dots fall into the blue shaded area, then this shows us that our residuals have little correlation with the lagged versions of themselves.
The diagnostics graphs indicate that we have found a well-fit model suitable for our dataset. Lastly, recall the RMSEs we found:
ES: 34.693
SARIMA: 25.72
Where SARIMA is an improvement on ES, since it produced a lower RMSE.
Complex Models
You may have come across various forecasting libraries and algorithms that can further improve upon SARIMA models. Therefore, it is necessary to mention some next steps.
LinkedIn released a time-series forecasting library called Greykite, that contains Silverkite, which is a forecasting algorithm that automates forecasting. Similarly, Facebook released Prophet, a forecasting algorithm that helps with dealing with spikes and changing trends in the data.
Prophet vs Silverkite:
A more detailed comparison can be found [here](http://A high-level comparison can be found here).
Prophet model
● Created by Facebook .
● It can handle missing data, outliers, seasonal effects and sudden changes in time series data e.g., Product promotions or a World cup etc.
● It is open source, and you can use both Python and R.
Silverkite model
● Created by LinkedIn.
● Allows for the customisation of regressors to account for seasonality, changes in data points, growth, interaction terms and autoregression.
● The preferred option if fast deployment is important.
● Various visualisation options that increases interpretability.
● Forecasting quantiles and not averages.
● Comes with built-in templates.
Conclusion
Looking back at our fictitious dataset, it makes sense that the sale of snowboards shows seasonality, because it is a winter sport. We also researched that the sale of snowboards is expected to increase over the next couple of years.
We saw that over-simplifying your model (ES) does not capture all the nuances in the data. Taking it a bit further (SARIMA) shows how adding an additional layer of complexity can drastically improve results. However, can we take it even further? This is where the art of modelling comes in, because just as one can over-simplify a model, one can also overcomplicate it.
Nevertheless, I would love to hear your feedback on how you would model this data.
Possible extensions and considerations:
● A typical winter season is classified into off-peak, shoulder and peak seasons.
● Unexpected closing of resorts due to the COVID period.
● Adding additional explanatory variables to the data to make it multivariate.
