
We all want to know the future. Imagine the power we’d possess if we knew what was going to happen in the future, we could alter it to get more suitable results, bet on it for financial gain, or even budget better. Although we can not outright determine what will happen in the future, we can build somewhat of an intuition of what it may be like.
We hear often from self-improvement evangelist that we can distinguish how we have arrived at our present point in life by reflecting on our past actions. Thereby, to some degree we can predict the trajectory of our lives if we continue on a particular path. This is the essence of Time-series analysis. As stated in Adhikari, R and Agrawal, R. (2013). _An Introduction Study on Time Series Modelling and Forecasting, "_The main aim of time-series modelling is to carefully collect and rigorously study the past observations of a time-series to develop an appropriate model which describes the inherent structure of the series. This model is then used to generate future values for the series, i.e. to make forecast. Time-series forecasting thus can be termed as the act of predicting the future by understanding the past."
"The present moment is an accumulation of past decisions" – Unknown
A popular and frequently used stochastic time-series model is the ARIMA model. It assumes that the time-series is linear and follows a particular known statistical distribution, such as the normal distribution, and has subclass of other models such as the Autoregressive (AR) model, the Moving average (MA) model, and the Autoregressive Moving Average (ARMA) model of which the ARIMA model was based on. Before effectively applying the ARIMA model to a problem, there are some things that we should understand about our data as you’ll come to understand by the end of this post.
Things to know – The 4 main components of a time series:
Trend → The propensity of a a time series to increase, decrease or stagnate over a long period of time.
Seasonality → Fluctuations within a year that are regular and predictable.
Cyclical → Medium term changes in the series that repeat in cycles.
Irregularities → Unpredictable influences that are not regular and do not repeat in a particular pattern.
Data
To download the data, click this Link and follow the instructions.
The data I will be using in this article is from the M5 Forecasting – Accuracy competition on Kaggle, which is currently still live (at the time of writing this article). The competition challenges competitors of whom have been provided with hierarchical sales data – thanks to Walmart- from 3 different states (California, Texas and Wisconsin) to forecast the sales 28 days into the future. For access to the code generated in this article can be found on a Kaggle Notebook that I created, which can be found here or in the link below.
Here are the frameworks we must import to perform the task at hand.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.stattools import adfuller
I have done some preprocessing to the data to make use of it’s hierarchical structure.
# store of the sales data columns
d_cols = full_df.columns[full_df.columns.str.contains("d_")]
# group columns by store_id
df= full_df.groupby(full_df["store_id"]).sum()[d_cols].T
df.head()

The competition is evaluated on RMSSE (Root Mean Squared Scaled Error), which is derived from the MASE (Mean Absolute Scaled Error) that was designed to be invariant and symmetric – you can learn more about forecast accuracy metrics here (the difference for this competition is that the A(Absoulute) in MASE is replaced with S(Squared) for Mean Squared Scaled Error and we take the Root of this for RMSSE).
Concept of Stationarity
Understanding the concept of stationarity is important as it has high impact on the type of model that we can fit to our data to forecast future values. We refer to a time-series as stationary when its properties do not depend on the time at which the series was observed. Some criteria for stationarity are as follows:
- Constant mean in the time-series
- Constant variance in the time-series
- No seasonality
Simply, a time-series that is stationary will have no predictable patterns in the long term. For the mathematicians, a random process is known to be stationary when the joint distribution remains the same over time. Let’s look at some random items from our data to see whether they are stationary.
The ARMA model is a combination of the Autoregressive and Moving Average models. This traditional approach requires the data to be stationary, however, things do not always work out as we’d expect in the real world. In-fact, in the real world, data is much more likely to be non-stationary, hence the birth of ARIMA which uses a clever technique called differencing to make non-stationary data stationary.
Differencing
Differencing computes the change between consecutive observations in the original series, which helps to stabilize the mean since it removes the changes in the level of a series – This has the effect of eliminating (or reducing) seasonality and trend. This technique is widely used for non-stationary data such as financial and economic data. The ARIMA model adopts the differencing technique to convert a time-series that is non-stationary to a stationary time-series. We can express the differenced series mathematically as shown in Figure 2.

When the differenced data does not appear to stationary, we can do differencing a 2nd time – It’s almost never required for you to go past the 2nd order in practice – which can be expressed mathematically in Figure 3.

We can also get the difference of an observation and another observation from the same season. This phenomenon is known as seasonal differencing.

Occasionally, we may be required to take the ordinary differences (this is the differencing technique we discuss in Figure 2.Referred to as first differences meaning the differences at lag 1) and seasonal differences to make our data stationary.

In python, we can use visualization and/or unit root test to determine whether differencing is required for our data – not there are other methods to determine stationarity. There are many different unit root test which have different assumption, but we will be using the Dickey-Fuller. Below I will visualize a store and try to determine whether you think it is stationary before you look at the results of the dickey-fuller test.

# Dickey-fuller statistical test
def ad_fuller(timeseries: pd.DataFrame, significance_level= 0.05):
non_stationary_cols= []
stationary_cols= []
for col in timeseries.columns:
dftest= adfuller(df[col], autolag="AIC")
if dftest[1] < significance_level:
non_stationary_cols.append(
{col:{"Test Statistic": dftest[0],
"p-value": dftest[1],
"# Lags": dftest[2],
"# Observations": dftest[3],
"Critical Values": dftest[4],
"Stationary": False}})
else:
stationary_cols.append(
{col:{"Test Statistic": dftest[0],
"p-value": dftest[1],
"# Lags": dftest[2],
"# Observations": dftest[3],
"Critical Values": dftest[4],
"Stationary": True}})
return non_stationary_cols, stationary_cols
non_stationary_cols, stationary_cols= ad_fuller(df[stores])
len(non_stationary_cols), len(stationary_cols)
>>>> (10, 0)
non_stationary_cols[0]

The p-value is greater than the significance level that we set (0.05) therefore we do not reject the null hypothesis that there is unit roots in our data. In other words, our data is non-stationary – it does not meet the criteria for stationarity that we described above, hence why we must do some differencing for our data to become stationary. Pandas has a cool function DataFrame.diff()
that does this for us – you can read more in the documentation here.
# making the data stationary
df["lag-1_CA_1"]= df["CA_1"].diff().fillna(df["CA_1"])
ACF and PACF plots
The ARIMA model has hyperparameters, p, d and q that must be defined. Autocorrelation Function (ACF) and Partial Autocorrelation function (PACF) plots make determining the order p, and q of the model easier.
The ACF plot shows the autocorrelation of the time-series, meaning that we can measure the relationship between the yt and y{t-k}. The simplest way to put this is as the coefficients of correlation between a time-series and the lags of itself.
Note: "y_t" denotes the subscript.
PACF plots show a measure of relationship between yt and y{t-k} after the effects of lags are removed. If we think of correlation, it’s the interdependence of variables. The "partial" correlation talks of the correlation between them that is not explained by their mutual correlations with a specified set of other variables. When we adjust this for autocorrelation, we speak of the correlation between a time-series and a lag of itself that is not explained by correlations from lower order lags.
This is a great resource to learn more about ACF and PACF plots.
Let’s see some of our visualizations…
_, ax= plt.subplots(1, 2, figsize= (10,8))
plot_acf(df["lag-1_CA_1"], lags=10, ax=ax[0]), plot_pacf(df["lag-1_CA_1"], lags=10, ax=ax[1])
plt.show()

This suggest that we should try to fit an AR(8) model to our data, which I have done in the next section.
Lag/Backshift Notation
Lag/Backshift notation is an extremely useful notation device. Various sources use different notation to denote Lag, L or Backshift, B.

The Autoregression, AR(p), model generates forecast by using a linear combination of past variables of the variable. We can think of autoregression as regression of the variable against itself.

Moving Average, MA(q), model on the other hand uses past forecast errors instead of past values, in a regression like model. Therefore, we can think of each forecast value to be a weighted moving average of the past few forecast errors.

The ARIMA model
All roads lead to this point. If we combine differencing, our autoregression model and moving average model, we get ARIMA(p, d, q).

Note that it is often much easier to use lag notation to denote the ARIMA model. You can learn more about how to do this here.
p = The order of the Autoregressive part of the model
d= The degree of first differencing in our model
q = The order of the Moving average part of the model

# fitting the model
model= ARIMA(df["lag-1_CA_1"], order=(8,1,0))
results= model.fit(disp=-1)
# visualizing the fitted values
fig= go.Figure(data=
[go.Scatter(x= df["date"],
y= df["lag-1_CA_1"],
name= "original",
showlegend=True,
marker=dict(color="blue"))])
fig.add_trace(
go.Scatter(x= df["date"],
y=results.fittedvalues,
name= "fitted values",
showlegend= True,
marker=dict(color="red")))
fig.update_layout(
title="Fitted values",
xaxis_title="Dates",
yaxis_title="Units Sold",
font=dict(
family="Arial, monospace",
size=14,
color="#7f7f7f"
)
)
fig.show()

We can have a closer look…
# a closer look
_, ax= plt.subplots(figsize=(12,8))
results.plot_predict(1799, 1940, dynamic=False, ax=ax)
plt.show()

To see how we done against actual predictions, we must first go back to the original scale of the data to compare. There is a useful cumsum()
function we can use in pandas – Documentation.
compare_df= pd.DataFrame({"actual": df["CA_1"],
"predictions": pd.Series(results.fittedvalues.cumsum(), copy=True),
"d": df["d"]}).set_index("d")
compare_df.loc["d_1", "predictions"]= 0
Then we plot this…

I have joined this competition a little late, but there is still sufficient time to better this result (which I will be sharing with you all).
Useful Resource: Workflow Guide
A useful flow chart was provided by Rob Hyndman in the book Forecasting: Principles and Practice which is extremely useful. A link to the online book will be in the Other Resources section below.

Final Word
Thank you for taking the time to read this article. I am a self-taught Data Scientist from London, England. I can be reached via LinkedIn via https://www.linkedin.com/in/kurtispykes/. Please do not hesitate to reach out to me, meeting new people is awesome.
Other Resources:
Forecasting: Principles and Practice, 2nd Edition
An Introduction Study on Time Series Modelling and Forecasting