
I recently fell in love with the topic of time series. I find it fascinating that so many things that we observe in the world around us, and perhaps things that we sometimes take for granted can actually be explained using time.
Think about it, road traffic tends to peak during certain hours during the day, ice-cream sales are usually higher during the summer, or more recently, how the trend of COVID-19 cases changes over time.
There is so much you can tell simply by examining how a variable behaves and changes over time. In Data Science, this is what we call time series analysis. Time series is a series of dependent data points that are indexed in time order, usually taken at successive and equally spaced points in time.
In this blog post, I will provide a gentle introduction to time series as well as share some basic terminology and concepts to help you get started in this exciting space. Specifically, we will cover :
- The concept of stationarity
- Why stationarity is important in time series analysis and forecasting
- How to test and generate a stationary time series process
- How to choose the right ARIMA model, test the fit of the model as well as generate future predictions
The code that goes along with this article can be found on my GitHub here.
Transform data into time series object
In order to properly work with time series data in R, we first have to define the data as a time series object using the ts( )
function.
testing.stationarity = read.table("testing.stationarity.txt")
head(testing.stationarity)
class(testing.stationarity)

testing.stationarity = ts(testing.stationarity)
head(testing.stationarity)
class(testing.stationarity)

The data is exactly the same, just stored as different classes in R.
Testing stationarity
Before we proceed to explore the various methods for testing stationarity in a time series process, it is helpful to first explain what stationarity is.
Stationarity is a crucial concept in time series analysis. Most time series models such as the ARIMA model, which we will discuss later, assume that each data point is independent of one another. In other words, before we can fit a time series model to the data and use the model to generate predictions, we need to first ensure that the time series is stationary.
A time series is considered stationary if it satisfies the following three conditions:
- The expected value (mean) is constant over time
- The volatility (variance) of the time series is constant around its mean over time
- The covariance of the time series depends only on the time lag
If this sounds confusing don’t worry, there are also visual cues you can look out for to quickly identify whether or not a time series is stationary. Effectively, there are two components: trend and seasonality. A time series is not stationary if any one of these components is present in the data.
Trend is the long-term trajectory of a time series. For instance, the stock market such as the S&P 500 index has seen an overall increasing trend over the last several decades.
Seasonality, on the other hand, represents a recurring pattern that happens at a fixed and known frequency. For example, annual retail sales tend to spike during the Christmas season.
There are two methods to test for stationarity in a time series:
- Phillips-Perron Unit Root Test
- Plotting the sample ACF
Phillips-Perron Unit Root Test, PP.test( )
is used to test the null hypothesis that a time series is integrated of order one, in other words, the time series requires further differencing to achieve stationarity.
PP.test(testing.stationarity)

Since the p-value is greater than 0.05, we do not have sufficient evidence to reject the null hypothesis and therefore conclude that the time series process needs to be differenced and is not stationary.
Alternatively, we could also plot the autocorrelation function (ACF) of the time series, which informs us of the correlation of the series with its lagged values.
acf(testing.stationarity, main="Data: testing.stationarity", ylab="Sample ACF")

As we can see from the ACF plot above, autocorrelation decays slowly which indicates that the time series is not stationary.
Removing trend
Now that we know trend and seasonality can cause a time series to not be stationary, let’s explore ways we can use to remove them.
There are a few ways to remove trends from a time series. Here, I have outlined two of the simplest methods:
- Differencing
- Least square trends removal
1. Differencing
Differencing means taking the difference between the data point and a previous data point of a given lag.
Here, we contrast the time series we saw from above and its differenced alternative of lag one, in addition to their respective autocorrelation functions.
# Difference data using lag=1 and differences=1
Xt = diff(testing.stationarity, lag=1, differences=1)
# Plot original data, differenced data, and their respective sample ACFs
par(mfrow=c(2,2))
ts.plot(testing.stationarity, main="Data: testing.stationarity",
ylab="Value")
ts.plot(Xt, main="Differenced data", ylab="Change in value")
acf(testing.stationarity, main="", ylab="Sample ACF")
acf(Xt, main="", ylab="Sample ACF")
par(mfrow=c(1,1))

We can clearly see that the trend that was once present in the original time series is now gone. Furthermore, the ACF now decays significantly faster.
Let’s perform the Phillips-Perron Unit Root Test to confirm our observation.
# Perform unit root test on differenced data
PP.test(Xt)

The p-value of the differenced data being lower than 0.05 suggests that we should reject the null hypothesis and conclude that the time series does not need to be differenced again and is now stationary.
Differencing: bonus!
To determine the number of times to difference a time series, we can choose the one that gives the lowest overall variance.
# Try difference values between 0-7 and store their respective variance as a data frame
ts.var = var(testing.stationarity)
for(i in 1:7) {
diff = diff(testing.stationarity, lag=1, differences = i)
ts.var[i+1] = var(diff)
}
ts.var.df = data.frame(diff=0:7, var=ts.var)
# Plot variance against the number of times data is differenced
plot(ts.var.df, type="l", ylab="Variance", xlab="d")

Variance is lowest when the time series process is differenced only once and increases subsequently. Hence, we conclude that the data does not need to be differenced a second time.
2. Least squares trend removal
The least-square trends removal, on the other hand, involves fitting a linear model to the time series and subtracting the fitted values from the data points.
Here, we have a different time series with an upward, positive trend over time.
# Generate time series data
set.seed(123)
n = 1000
sim = arima.sim(list(ar=0.9), n)
xt = 2000+cumsum(sim)
# Plot generated time series
ts.plot(xt, col="blue", main="Time series with trend", ylab="Data")

We can see that the simulated time series has an overall increasing trend. To remove this trend, we will fit a linear model to the data and then subtract the fitted values from the data points to obtain the residuals.
If done properly, the residuals should have no remaining trend, or in other words, have a constant mean over time.
# Fit a linear model on time series, extract fitted values and residuals
time = time(xt)
fit = lm(xt ~ time)
yt = fit$fitted.values
zt = fit$residuals
# Plot time series with superimposed linear model and residuals
par(mfrow=c(2,1))
ts.plot(xt, col="blue", main="Regression example", ylab="Data")
abline(fit, col="red")
plot(xt-yt, type="l", col="green", xlab="Time", ylab="Residuals")
par(mfrow=c(1, 1))

Removing seasonality
In addition to trends, seasonality can also contribute to a time series to become not stationary.
Here, I will demonstrate three methods to eliminate seasonality from a time series:
- Seasonal differencing
- Seasonal means
- Method of moving averages
Before we begin, let’s briefly discuss the ldeaths
dataset in R. It represents monthly death figures from bronchitis, emphysema and asthma in the UK between the years 1974–1979, for both males and females.
# Plot ldeaths
plot(ldeaths, main="Monthly deaths from lung diseases in the UK", ylab="Deaths")
points(ldeaths, pch=20)
# Add red vertical line at the start of each year
abline(v=1974:1980, col="red")
# Plot sample ACF of ldeaths
acf(ldeaths, main="Sample ACF of ldeaths", ylab="Sample ACF", lag.max=36)


There is a very obvious yearly seasonal effect that is taking place in the charts. Although the highest point each year might not necessarily correspond to the same month, there is, however, still a visible annual trend that is present in the data.
1. Seasonal differencing
Seasonal differencing means subtracting each data point by a previous data point of a fixed lag.
# Difference ldeaths using lag=12 i.e. January 1975 minus January 1974, February 1975 minus February 1974, and so on
sdiff.ldeaths = diff(ldeaths, lag=12, differences=1)
# Plot original data, differenced data, and their respective sample ACFs
par(mfrow=c(2,2))
ts.plot(ldeaths, main="Data: ldeaths", ylab="Number of deaths")
acf(ldeaths, main="Sample ACF of ldeaths", ylab="Sample ACF")
ts.plot(sdiff.ldeaths, main="Data: sdiff.ldeaths", ylab="Difference in number of deaths")
acf(sdiff.ldeaths, main="Sample ACF of sdiff.ldeaths", ylab="Sample ACF")
par(mfrow=c(1,1))

2. Seasonal means
Seasonal means involves subtracting each data point by their respective group averages, for instance in this particular scenario, the monthly average.
# Generate ldeaths as dataframe
ldeaths.df = data.frame(year=rep(1974:1979, each=12),
month=rep(1:12, 6),
value=ldeaths)
head(ldeaths.df, 12)

# Monthly averages of ldeaths dataset
xbars = aggregate(value ~ month, data = ldeaths.df, mean)
xbars

# Subtract each month in ldeaths by their respective means
yt = ldeaths - xbars$value
# Plot ldeaths after subtracting seasonal means
par(mfrow=c(2, 1))
plot(yt, main="Monthly deaths from lung diseases in the UK", ylab="Deaths")
points(yt, pch=20)
acf(yt, main="Sample ACF of the series ldeaths less seasonal means", ylab="Sample ACF", lag.max=36)
par(mfrow=c(1, 1))

We can see from both approaches above, the new time series no longer has the seasonality that was originally present.
3. Method of moving averages
Last but not least, the method of moving averages involves calculating the trend of a time series by taking the moving average over the entire time series.
We can decompose a time series to separate out trend, seasonality and white noise.
# Decompose ldeaths into its trend, seasonal and random components
plot(decompose(ldeaths))

# Store trend, seasonal and random as individuals variables
decomp = decompose(ldeaths)
trend = decomp$trend
seasonal = decomp$seasonal
random = decomp$random
# Plot data, trend and seasonal + trend
ts.plot(ldeaths, ylab="", main="Components of time series: ldeaths", col="grey")
lines(trend, col="red")
lines(seasonal+trend, col="blue")
legend("topright", legend=c("Data", "Trend", "Seasonal + trend"), col=c("grey", "red", "blue"), lty=1)

Model fitting
The ARIMA(p, d, q) model is one of the most commonly used time series models. It is made up of three hyperparameters:
- p = Autoregression (AR)
- d = Differencing (I)
- q = Moving average (MA)
Although there are existing functions such as auto.arima( )
in the forecast
package that can help us determine these hyperparameters, in this section, we will focus on some examples to learn how to manually select values for p, d, and q.
Specifically, we will be investigating the ACF and Partial ACF plots of two separate time series processes to determine the hyperparameters for our ARIMA models.
Choosing the right ARIMA model: Example 1
# Read time series data
data = read.csv("fittingmodelEg1.csv", header=F)
data = ts(data[, 1])
# Plot data, ACF and partial ACF
m = matrix(c(1, 1, 2, 3), 2, 2, byrow=TRUE)
layout(m)
ts.plot(data, ylab="")
acf(data,main="")
pacf(data,main="")
par(mfrow=c(1,1))

We previously covered what an autocorrelation function (ACF)is, which informs us of the correlation of the series with its lagged values. Partial autocorrelation function (PACF), on the other hand, measures the correlation of the residuals of the next lag value that were not accounted for in the previous lag.
By looking at the time series above, we can see that there is no obvious trend and the variance also appears reasonably constant over time.
The ACF does not decrease slowly which therefore suggests that the process does not need to be differenced further, in other words, set d=0. Moreover, the ACF also drops abruptly inside the confidence interval after lag 3. This suggests that we should set the moving average hyperparameter to equal to 3, that is, q=3.
PACF, on the other hand, decays more gradually.
Given the information above, we could try fitting a MA(3) model to this data, or equivalently, ARIMA(p=0, d=0, q=3). In practice, we would investigate multiple models before deciding on a final fit.
Choosing the right ARIMA model: Example 2
# Read time series data
data2 = read.csv("fittingmodelEg2.csv", header=F)
data2 = ts(data2[, 1])
# Plot data, ACF and partial ACF
m = matrix(c(1, 1, 2, 3), 2, 2, byrow=TRUE)
layout(m)
ts.plot(data2, ylab="")
acf(data2,main="")
pacf(data2,main="")
par(mfrow=c(1,1))

Similar to example 1, there is no obvious trend in the data itself, and the variable appears to be reasonably constant over time.
The ACF does not show a steadily decreasing trend and so the data appears to be stationary. As a result, we can set d=0.
The PACF, on the other hand, drops suddenly inside the confidence interval after lag 2 suggesting that we should set the autoregression hyperparameter to equal to 2, in other words, p=2.
Based on the information above, we could try fitting an AR(2) model to this data, or equivalently, an ARIMA(p=2, d=0, q=0) model.
Testing the fit of a model
Now that we briefly went over how to choose the right model for a particular time series data, let’s discuss how to test if the model we chose is actually the right fit.
In this section, we will look at three separate techniques to test the fit of a time series model:
- Graph of residuals
- Ljung-Box test
- Akaike Information Criterion (AIC)
1. Graph of residuals
Graph of residuals, as the name suggests, plots the residuals of the time series data that are not accounted for by the model.
A model that fits the data well will produce residuals that are centred around the mean of zero, in addition to having a relatively stable variance over time.
See below the graph of residuals after fitting a MA(3) model to example 1 we saw from the previous exercise.
# Fit MA(3) model to data and extract residuals
ma3 = arima(data, order=c(0, 0, 3))
residuals = ma3$residuals
# Plot residuals and ACF of residuals
par(mfrow=c(2, 1))
ts.plot(residuals, main="MA(3) residuals", ylab="Residuals", col="blue")
acf(residuals, main="", ylab="ACF")
par(mfrow=c(1, 1))

The mean and variance of the residuals plot are broadly constant over time. The ACF of the residuals are small and with no significant patterns, so we can conclude that the residuals appear to be independent.
As a result, we can also conclude that the MA(3) model provides a good fit to the data.
2. Ljung-Box test
The Ljung-Box test is a statistical test that measures whether or not a group of autocorrelations of a time series are different from zero.
Box.test(residuals, lag=5, type="Ljung", fitdf=3)

From the output above, since the p-value is greater than 0.05, we do not reject the null hypothesis and conclude that the model is a good fit to the data.
3. Akaike Information Criterion (AIC)
Last but not least, the AIC is a measure of the trade-off between the goodness of fit of a model and the number of parameters used in the model.
While a model with many parameters may provide a very good fit to the data, it may not necessarily be an accurate predictor of the future. We commonly label this as overfitting in Machine Learning. Conversely, a model with too few parameters may not be sufficient to capture the significant patterns that are in the underlying data itself.
Therefore, a good model must be able to strike a healthy balance between the two effects, and the AIC helps us to objectively measure this.
Before we investigate how AIC works in practice, let’s determine an appropriate model for a new time series data below.
# Read time series data
data3 = read.csv("fittingmodelEg3.csv", header=F)
data3 = ts(data3[, 1])
# Plot data without differencing and differenced data
m = matrix(c(1, 1, 4, 4, 2, 3, 5, 6), 2, 4, byrow=TRUE)
layout(m)
ts.plot(data3, main="Data without differencing", ylab="")
acf(data3, main="", ylab="Sample ACF")
pacf(data3, main="", ylab="Sample PACF")
d = diff(data3)
ts.plot(d, main="Differenced data", ylab="")
acf(d, main="", ylab="Sample ACF")
pacf(d, main = "", ylab="Sample PACF")
par(mfrow=c(1, 1))

The left half of the output above indicates three plots relating to the original time series data. The data shows a clear downward trend which is reflected in a slowly decaying sample ACF. This suggests that we need to difference the data in order to remove its trend.
The right half of the output relates to the same time series data after differencing one time. We can see that after differencing, we managed to remove the slow decay in the sample ACF.
So far, we know that our ARIMA model should be of ARIMA(p, d=1, q). To help us determine the values for p and q, we will deploy the AIC. Specifically, we will choose the combination of values for both p and q based on the one that gives us the lowest AIC value.
# Try values 0-2 for both p and q, record their respective AIC and put them into a data sframe
aic.result = numeric(3)
for (p in 0:2) {
for (q in 0:2) {
aic = arima(d, order=c(p, 0, q))$aic
aic.result = rbind(aic.result, c(p, q, aic))
}
}
aic.result = aic.result[-1, ]
colnames(aic.result) = c("p", "q", "AIC")
aic.result

We can see that p=2 and q=2 yield the lowest AIC, hence we should fit an ARIMA(2, 1, 2) model to this particular data.
Forecasting using an ARIMA model
We are nearing the end of our Time Series Analysis and forecasting exercise. Now that we have determined the right model to use for our data, let’s use it to generate future predictions.
For simplicity, let’s predict 100 steps ahead from the final data point in our original time series process.
# Fit ARIMA(2, 0, 2) to differenced data, since data has already been differenced, we can set d=0
fit = arima(d, order=c(2, 0, 2))
# Predict 100 steps ahead using ARIMA(2, 1, 2) model
predictions = predict(fit, n.ahead=100)
predictions = predictions$pred
# Aggregate predictions with the final point of past data
predictions.with.trend = tail(data3, 1) + cumsum(predictions)
predictions.with.trend = ts(predictions.with.trend, start=501, frequency=1)
# Plot past data and forecasts
xlim = c(0, 600)
ylim = c(floor(min(data3, predictions.with.trend)), ceiling(max(data3, predictions.with.trend)))
ts.plot(data3, xlim=xlim, ylim=ylim, col="blue", main="Past data and forecasts", ylab="")
lines(predictions.with.trend, col="red")

We have covered quite a bit in this blog post. To summarise, we first learned about the concept of stationarity, followed by how to test and spot if a time series is stationary by observing its trend and seasonality. We also learned a few handy techniques to remove seasonality from a time series.
We then proceed to explore Time Series Forecasting where we learned how to choose the right ARIMA model for our data, test for goodness of fit and finally generate predictions based on the model that we have chosen.
All in all, I have barely scratched the surface in this blog post on the vast topic that is time series analysis, but nevertheless, I hope that this has been helpful enough as a general introduction to the topic as well as set the initial foundation for you to explore even further.
If you found any value from this article and are not yet a Medium member, it would mean a lot to me as well as the other writers on this platform if you sign up for membership using the link below. It encourages us to continue putting out high quality and informative content just like this one – thank you in advance!
Don’t know what to read next? Here are some suggestions.
Want to Stand Out as a Data Scientist? Start Practicing These Non-Technical Skills