Seasonality is a crucial aspect of time-series analysis. As time-series are indexed forward in time, they are subject to seasonal fluctuations. For example, we expect ice cream sales to be higher in the summer months and lower in the winter months.
Seasonality can come in different time intervals such as days, weeks or months. The key for time-series analysis is to understand how the seasonality affects our series, therefore making us produce better forecasts for the future.
In this post we will go over an example of seasonal data and then show how we can remove it. The reason we want to remove it is to make our time-series stationary, which is a requirement by most forecasting models. If you want to learn more about stationarity, checkout my previous posts here:
The data is indexed by month and we can clearly see a yearly seasonal pattern where the number of passengers peaks in the summer months. There is also the overrall trend __ of the number of passengers increasing through time.
Removing Seasonality
We can remove seasonality in the data using differencing, which calculates the difference between the current value and its value in the previous season. The reason this is done is to make the time series stationary rendering its statistical properties constant through time. Seasonality causes the mean of the time series to be different when we are in a particular season. Hence, its statistical properties are not constant.
Seasonal differencing is mathematically described as:
Equation generated by author in LaTeX.
Where d(t) is the differenced data point at time t, y(t) is the value of the series at t, y(t-m) is the value of the data point at the previous season and m is the length of one season. In our case m=12 as we have yearly seasonality.
We can use the pandas diff() method to calculate the seasonal differences and plot the resultant series:
Plot generated by author in Python.
The yearly seasonality has disappeared now, however we now observe some cycle. This is another common feature time series which is similar to seasonality but are typically on a longer timescale as observed here.
We can test that the resultant series is stationary using the Augmented Dickey-Fuller (ADF) test. The **** null hypothesis of this test is that the series is non-stationary. The statsmodels package provides a function for carrying out the ADF test:
The P-Value is lower than the 5% and 10% threshold, but higher than the 1% threshold. Therefore, depending on your significance level we can either statistically confirm or deny that our series is stationary.
We can also carry out some further regular differencing (difference between adjacent values) to further reduce the P-Value. However, in this case I think the data is adequately stationary given it is below the 5% threshold.
It is also best practise to stabilise the variance as that is one of the conditions of stationarity. To achieve this, we could have used the Box Cox transform. If you want to learn more about stabilising the variance, checkout my previous article on it:
In this article we have shown what seasonality is and how it looks like as. We can remove seasonality through differencing and confirm whether the resultant series is stationary using the ADF test.
The full Python script for this article can be found at my GitHub here:
I have a free newsletter, Dishing the Data, where I share weekly tips for becoming a better Data Scientist. There is no "fluff" or "clickbait," just pure actionable insights from a practicing Data Scientist.