The world’s leading publication for data science, AI, and ML professionals.

Beginner’s Introduction to Time Series Analysis and Forecasting

Stationarity, time series decomposition, ARIMA modelling, and more

Photo by Di_An_h on Unsplash
Photo by Di_An_h on Unsplash

I recently fell in love with the topic of time series. I find it fascinating that so many things that we observe in the world around us, and perhaps things that we sometimes take for granted can actually be explained using time.

Think about it, road traffic tends to peak during certain hours during the day, ice-cream sales are usually higher during the summer, or more recently, how the trend of COVID-19 cases changes over time.

There is so much you can tell simply by examining how a variable behaves and changes over time. In Data Science, this is what we call time series analysis. Time series is a series of dependent data points that are indexed in time order, usually taken at successive and equally spaced points in time.

In this blog post, I will provide a gentle introduction to time series as well as share some basic terminology and concepts to help you get started in this exciting space. Specifically, we will cover :

  • The concept of stationarity
  • Why stationarity is important in time series analysis and forecasting
  • How to test and generate a stationary time series process
  • How to choose the right ARIMA model, test the fit of the model as well as generate future predictions

The code that goes along with this article can be found on my GitHub here.


Transform data into time series object

In order to properly work with time series data in R, we first have to define the data as a time series object using the ts( ) function.

testing.stationarity = read.table("testing.stationarity.txt")
head(testing.stationarity)
class(testing.stationarity)
Data frame object; Image by Author
Data frame object; Image by Author
testing.stationarity = ts(testing.stationarity)
head(testing.stationarity)
class(testing.stationarity)
Time series object; Image by Author
Time series object; Image by Author

The data is exactly the same, just stored as different classes in R.


Testing stationarity

Before we proceed to explore the various methods for testing stationarity in a time series process, it is helpful to first explain what stationarity is.

Stationarity is a crucial concept in time series analysis. Most time series models such as the ARIMA model, which we will discuss later, assume that each data point is independent of one another. In other words, before we can fit a time series model to the data and use the model to generate predictions, we need to first ensure that the time series is stationary.

A time series is considered stationary if it satisfies the following three conditions:

  1. The expected value (mean) is constant over time
  2. The volatility (variance) of the time series is constant around its mean over time
  3. The covariance of the time series depends only on the time lag

If this sounds confusing don’t worry, there are also visual cues you can look out for to quickly identify whether or not a time series is stationary. Effectively, there are two components: trend and seasonality. A time series is not stationary if any one of these components is present in the data.

Trend is the long-term trajectory of a time series. For instance, the stock market such as the S&P 500 index has seen an overall increasing trend over the last several decades.

Seasonality, on the other hand, represents a recurring pattern that happens at a fixed and known frequency. For example, annual retail sales tend to spike during the Christmas season.

There are two methods to test for stationarity in a time series:

  1. Phillips-Perron Unit Root Test
  2. Plotting the sample ACF

Phillips-Perron Unit Root Test, PP.test( ) is used to test the null hypothesis that a time series is integrated of order one, in other words, the time series requires further differencing to achieve stationarity.

PP.test(testing.stationarity)
Image by Author
Image by Author

Since the p-value is greater than 0.05, we do not have sufficient evidence to reject the null hypothesis and therefore conclude that the time series process needs to be differenced and is not stationary.

Alternatively, we could also plot the autocorrelation function (ACF) of the time series, which informs us of the correlation of the series with its lagged values.

acf(testing.stationarity, main="Data: testing.stationarity", ylab="Sample ACF")
Image by Author
Image by Author

As we can see from the ACF plot above, autocorrelation decays slowly which indicates that the time series is not stationary.


Removing trend

Now that we know trend and seasonality can cause a time series to not be stationary, let’s explore ways we can use to remove them.

There are a few ways to remove trends from a time series. Here, I have outlined two of the simplest methods:

  1. Differencing
  2. Least square trends removal

1. Differencing

Differencing means taking the difference between the data point and a previous data point of a given lag.

Here, we contrast the time series we saw from above and its differenced alternative of lag one, in addition to their respective autocorrelation functions.

# Difference data using lag=1 and differences=1
Xt = diff(testing.stationarity, lag=1, differences=1)

# Plot original data, differenced data, and their respective sample ACFs 
par(mfrow=c(2,2))
ts.plot(testing.stationarity, main="Data: testing.stationarity",
        ylab="Value")
ts.plot(Xt, main="Differenced data", ylab="Change in value")
acf(testing.stationarity, main="", ylab="Sample ACF")
acf(Xt, main="", ylab="Sample ACF")
par(mfrow=c(1,1))
Image by Author
Image by Author

We can clearly see that the trend that was once present in the original time series is now gone. Furthermore, the ACF now decays significantly faster.

Let’s perform the Phillips-Perron Unit Root Test to confirm our observation.

# Perform unit root test on differenced data 
PP.test(Xt)
Image by Author
Image by Author

The p-value of the differenced data being lower than 0.05 suggests that we should reject the null hypothesis and conclude that the time series does not need to be differenced again and is now stationary.

Differencing: bonus!

To determine the number of times to difference a time series, we can choose the one that gives the lowest overall variance.

# Try difference values between 0-7 and store their respective variance as a data frame 
ts.var = var(testing.stationarity)
for(i in 1:7) {
  diff = diff(testing.stationarity, lag=1, differences = i)
  ts.var[i+1] = var(diff)
}
ts.var.df = data.frame(diff=0:7, var=ts.var)

# Plot variance against the number of times data is differenced 
plot(ts.var.df, type="l", ylab="Variance", xlab="d")
Overall variance is lowest at d=1; Image by Author
Overall variance is lowest at d=1; Image by Author

Variance is lowest when the time series process is differenced only once and increases subsequently. Hence, we conclude that the data does not need to be differenced a second time.

2. Least squares trend removal

The least-square trends removal, on the other hand, involves fitting a linear model to the time series and subtracting the fitted values from the data points.

Here, we have a different time series with an upward, positive trend over time.

# Generate time series data 
set.seed(123)
n = 1000
sim = arima.sim(list(ar=0.9), n)
xt = 2000+cumsum(sim)

# Plot generated time series 
ts.plot(xt, col="blue", main="Time series with trend", ylab="Data")
Image by Author
Image by Author

We can see that the simulated time series has an overall increasing trend. To remove this trend, we will fit a linear model to the data and then subtract the fitted values from the data points to obtain the residuals.

If done properly, the residuals should have no remaining trend, or in other words, have a constant mean over time.

# Fit a linear model on time series, extract fitted values and residuals 
time = time(xt)
fit = lm(xt ~ time)
yt = fit$fitted.values
zt = fit$residuals

# Plot time series with superimposed linear model and residuals 
par(mfrow=c(2,1))
ts.plot(xt, col="blue", main="Regression example", ylab="Data")
abline(fit, col="red")
plot(xt-yt, type="l", col="green", xlab="Time", ylab="Residuals")
par(mfrow=c(1, 1))
Image by Author
Image by Author

Removing seasonality

In addition to trends, seasonality can also contribute to a time series to become not stationary.

Here, I will demonstrate three methods to eliminate seasonality from a time series:

  1. Seasonal differencing
  2. Seasonal means
  3. Method of moving averages

Before we begin, let’s briefly discuss the ldeaths dataset in R. It represents monthly death figures from bronchitis, emphysema and asthma in the UK between the years 1974–1979, for both males and females.

# Plot ldeaths 
plot(ldeaths, main="Monthly deaths from lung diseases in the UK", ylab="Deaths")
points(ldeaths, pch=20)

# Add red vertical line at the start of each year  
abline(v=1974:1980, col="red")

# Plot sample ACF of ldeaths 
acf(ldeaths, main="Sample ACF of ldeaths", ylab="Sample ACF", lag.max=36)
Image by Author
Image by Author
Image by Author
Image by Author

There is a very obvious yearly seasonal effect that is taking place in the charts. Although the highest point each year might not necessarily correspond to the same month, there is, however, still a visible annual trend that is present in the data.

1. Seasonal differencing

Seasonal differencing means subtracting each data point by a previous data point of a fixed lag.

# Difference ldeaths using lag=12 i.e. January 1975 minus January 1974, February 1975 minus February 1974, and so on 
sdiff.ldeaths = diff(ldeaths, lag=12, differences=1)

# Plot original data, differenced data, and their respective sample ACFs   
par(mfrow=c(2,2))
ts.plot(ldeaths, main="Data: ldeaths", ylab="Number of deaths")
acf(ldeaths, main="Sample ACF of ldeaths", ylab="Sample ACF")
ts.plot(sdiff.ldeaths, main="Data: sdiff.ldeaths", ylab="Difference in number of deaths")
acf(sdiff.ldeaths, main="Sample ACF of sdiff.ldeaths", ylab="Sample ACF")
par(mfrow=c(1,1))
Image by Author
Image by Author

2. Seasonal means

Seasonal means involves subtracting each data point by their respective group averages, for instance in this particular scenario, the monthly average.

# Generate ldeaths as dataframe 
ldeaths.df = data.frame(year=rep(1974:1979, each=12),
                        month=rep(1:12, 6),
                        value=ldeaths)
head(ldeaths.df, 12)
First 12 data points of the ldeaths dataset; Image by Author
First 12 data points of the ldeaths dataset; Image by Author
# Monthly averages of ldeaths dataset 
xbars = aggregate(value ~ month, data = ldeaths.df, mean)
xbars
Monthly averages of the ldeaths dataset; Image by Author
Monthly averages of the ldeaths dataset; Image by Author
# Subtract each month in ldeaths by their respective means
yt = ldeaths - xbars$value

# Plot ldeaths after subtracting seasonal means 
par(mfrow=c(2, 1))
plot(yt, main="Monthly deaths from lung diseases in the UK", ylab="Deaths")
points(yt, pch=20)
acf(yt, main="Sample ACF of the series ldeaths less seasonal means", ylab="Sample ACF", lag.max=36)
par(mfrow=c(1, 1))
Image by Author
Image by Author

We can see from both approaches above, the new time series no longer has the seasonality that was originally present.

3. Method of moving averages

Last but not least, the method of moving averages involves calculating the trend of a time series by taking the moving average over the entire time series.

We can decompose a time series to separate out trend, seasonality and white noise.

# Decompose ldeaths into its trend, seasonal and random components 
plot(decompose(ldeaths))
Image by Author
Image by Author
# Store trend, seasonal and random as individuals variables  
decomp = decompose(ldeaths) 
trend = decomp$trend
seasonal = decomp$seasonal
random = decomp$random

# Plot data, trend and seasonal + trend  
ts.plot(ldeaths, ylab="", main="Components of time series: ldeaths", col="grey")
lines(trend, col="red")
lines(seasonal+trend, col="blue")
legend("topright", legend=c("Data", "Trend", "Seasonal + trend"), col=c("grey", "red", "blue"), lty=1)
Image by Author
Image by Author

Model fitting

The ARIMA(p, d, q) model is one of the most commonly used time series models. It is made up of three hyperparameters:

  • p = Autoregression (AR)
  • d = Differencing (I)
  • q = Moving average (MA)

Although there are existing functions such as auto.arima( ) in the forecast package that can help us determine these hyperparameters, in this section, we will focus on some examples to learn how to manually select values for p, d, and q.

Specifically, we will be investigating the ACF and Partial ACF plots of two separate time series processes to determine the hyperparameters for our ARIMA models.

Choosing the right ARIMA model: Example 1

# Read time series data 
data = read.csv("fittingmodelEg1.csv", header=F)
data = ts(data[, 1])

# Plot data, ACF and partial ACF 
m = matrix(c(1, 1, 2, 3), 2, 2, byrow=TRUE)
layout(m)
ts.plot(data, ylab="")
acf(data,main="")
pacf(data,main="")
par(mfrow=c(1,1))
Image by Author
Image by Author

We previously covered what an autocorrelation function (ACF)is, which informs us of the correlation of the series with its lagged values. Partial autocorrelation function (PACF), on the other hand, measures the correlation of the residuals of the next lag value that were not accounted for in the previous lag.

By looking at the time series above, we can see that there is no obvious trend and the variance also appears reasonably constant over time.

The ACF does not decrease slowly which therefore suggests that the process does not need to be differenced further, in other words, set d=0. Moreover, the ACF also drops abruptly inside the confidence interval after lag 3. This suggests that we should set the moving average hyperparameter to equal to 3, that is, q=3.

PACF, on the other hand, decays more gradually.

Given the information above, we could try fitting a MA(3) model to this data, or equivalently, ARIMA(p=0, d=0, q=3). In practice, we would investigate multiple models before deciding on a final fit.

Choosing the right ARIMA model: Example 2

# Read time series data 
data2 = read.csv("fittingmodelEg2.csv", header=F)
data2 = ts(data2[, 1])

# Plot data, ACF and partial ACF 
m = matrix(c(1, 1, 2, 3), 2, 2, byrow=TRUE)
layout(m)
ts.plot(data2, ylab="")
acf(data2,main="")
pacf(data2,main="")
par(mfrow=c(1,1))
Image by Author
Image by Author

Similar to example 1, there is no obvious trend in the data itself, and the variable appears to be reasonably constant over time.

The ACF does not show a steadily decreasing trend and so the data appears to be stationary. As a result, we can set d=0.

The PACF, on the other hand, drops suddenly inside the confidence interval after lag 2 suggesting that we should set the autoregression hyperparameter to equal to 2, in other words, p=2.

Based on the information above, we could try fitting an AR(2) model to this data, or equivalently, an ARIMA(p=2, d=0, q=0) model.


Testing the fit of a model

Now that we briefly went over how to choose the right model for a particular time series data, let’s discuss how to test if the model we chose is actually the right fit.

In this section, we will look at three separate techniques to test the fit of a time series model:

  1. Graph of residuals
  2. Ljung-Box test
  3. Akaike Information Criterion (AIC)

1. Graph of residuals

Graph of residuals, as the name suggests, plots the residuals of the time series data that are not accounted for by the model.

A model that fits the data well will produce residuals that are centred around the mean of zero, in addition to having a relatively stable variance over time.

See below the graph of residuals after fitting a MA(3) model to example 1 we saw from the previous exercise.

# Fit MA(3) model to data and extract residuals 
ma3 = arima(data, order=c(0, 0, 3))
residuals = ma3$residuals

# Plot residuals and ACF of residuals 
par(mfrow=c(2, 1))
ts.plot(residuals, main="MA(3) residuals", ylab="Residuals", col="blue")
acf(residuals, main="", ylab="ACF")
par(mfrow=c(1, 1))
Image by Author
Image by Author

The mean and variance of the residuals plot are broadly constant over time. The ACF of the residuals are small and with no significant patterns, so we can conclude that the residuals appear to be independent.

As a result, we can also conclude that the MA(3) model provides a good fit to the data.

2. Ljung-Box test

The Ljung-Box test is a statistical test that measures whether or not a group of autocorrelations of a time series are different from zero.

Box.test(residuals, lag=5, type="Ljung", fitdf=3)

From the output above, since the p-value is greater than 0.05, we do not reject the null hypothesis and conclude that the model is a good fit to the data.

3. Akaike Information Criterion (AIC)

Last but not least, the AIC is a measure of the trade-off between the goodness of fit of a model and the number of parameters used in the model.

While a model with many parameters may provide a very good fit to the data, it may not necessarily be an accurate predictor of the future. We commonly label this as overfitting in Machine Learning. Conversely, a model with too few parameters may not be sufficient to capture the significant patterns that are in the underlying data itself.

Therefore, a good model must be able to strike a healthy balance between the two effects, and the AIC helps us to objectively measure this.

Before we investigate how AIC works in practice, let’s determine an appropriate model for a new time series data below.

# Read time series data 
data3 = read.csv("fittingmodelEg3.csv", header=F)
data3 = ts(data3[, 1])

# Plot data without differencing and differenced data 
m = matrix(c(1, 1, 4, 4, 2, 3, 5, 6), 2, 4, byrow=TRUE)
layout(m)
ts.plot(data3, main="Data without differencing", ylab="")
acf(data3, main="", ylab="Sample ACF")
pacf(data3, main="", ylab="Sample PACF")
d = diff(data3)
ts.plot(d, main="Differenced data", ylab="")
acf(d, main="", ylab="Sample ACF")
pacf(d, main = "", ylab="Sample PACF")
par(mfrow=c(1, 1))
Image by Author
Image by Author

The left half of the output above indicates three plots relating to the original time series data. The data shows a clear downward trend which is reflected in a slowly decaying sample ACF. This suggests that we need to difference the data in order to remove its trend.

The right half of the output relates to the same time series data after differencing one time. We can see that after differencing, we managed to remove the slow decay in the sample ACF.

So far, we know that our ARIMA model should be of ARIMA(p, d=1, q). To help us determine the values for p and q, we will deploy the AIC. Specifically, we will choose the combination of values for both p and q based on the one that gives us the lowest AIC value.

# Try values 0-2 for both p and q, record their respective AIC and put them into a data sframe 
aic.result = numeric(3)
for (p in 0:2) {
  for (q in 0:2) {
    aic = arima(d, order=c(p, 0, q))$aic
    aic.result = rbind(aic.result, c(p, q, aic))
  }
}
aic.result = aic.result[-1, ]
colnames(aic.result) = c("p", "q", "AIC")
aic.result
Image by Author
Image by Author

We can see that p=2 and q=2 yield the lowest AIC, hence we should fit an ARIMA(2, 1, 2) model to this particular data.


Forecasting using an ARIMA model

We are nearing the end of our Time Series Analysis and forecasting exercise. Now that we have determined the right model to use for our data, let’s use it to generate future predictions.

For simplicity, let’s predict 100 steps ahead from the final data point in our original time series process.

# Fit ARIMA(2, 0, 2) to differenced data, since data has already been differenced, we can set d=0 
fit = arima(d, order=c(2, 0, 2))
# Predict 100 steps ahead using ARIMA(2, 1, 2) model 
predictions = predict(fit, n.ahead=100)
predictions = predictions$pred

# Aggregate predictions with the final point of past data 
predictions.with.trend = tail(data3, 1) + cumsum(predictions)
predictions.with.trend = ts(predictions.with.trend, start=501, frequency=1)

# Plot past data and forecasts 
xlim = c(0, 600)
ylim = c(floor(min(data3, predictions.with.trend)), ceiling(max(data3, predictions.with.trend)))
ts.plot(data3, xlim=xlim, ylim=ylim, col="blue", main="Past data and forecasts", ylab="")
lines(predictions.with.trend, col="red")
Image by Author
Image by Author

We have covered quite a bit in this blog post. To summarise, we first learned about the concept of stationarity, followed by how to test and spot if a time series is stationary by observing its trend and seasonality. We also learned a few handy techniques to remove seasonality from a time series.

We then proceed to explore Time Series Forecasting where we learned how to choose the right ARIMA model for our data, test for goodness of fit and finally generate predictions based on the model that we have chosen.

All in all, I have barely scratched the surface in this blog post on the vast topic that is time series analysis, but nevertheless, I hope that this has been helpful enough as a general introduction to the topic as well as set the initial foundation for you to explore even further.

If you found any value from this article and are not yet a Medium member, it would mean a lot to me as well as the other writers on this platform if you sign up for membership using the link below. It encourages us to continue putting out high quality and informative content just like this one – thank you in advance!

Join Medium with my referral link – Jason Chong


Don’t know what to read next? Here are some suggestions.

Want to Stand Out as a Data Scientist? Start Practicing These Non-Technical Skills

Battle of the Ensemble – Random Forest vs Gradient Boosting

Let’s End the Debate – Actuary vs Data Scientist


Related Articles