
If you have worked with time series, you have probably already used seasonal_decompose
from statsmodel
(or R’s equivalent). Long story short, it splits a time series into three components: trend, seasonality, and the residuals. After running the command, you see something like the plot below.

The interpretation of the components is also very intuitive:
- trend – the general direction of the series over a long period of time
- seasonality – a distinct, repeating pattern observed in regular intervals due to various seasonal factors. Could be monthly, weekly, etc.
- residual – the irregular component consisting of the fluctuations in the time series after removing the previous components
This should be enough of a recap. I will also not go into much detail on the goal of seasonal decomposition (understanding the data, forecasting, outlier detection). Instead, I wanted to explore a less popular angle – what actually happens when you call the seasonal_decompose
function. In this hand-on article, we will answer the question: How are those components estimated? If you are curious, read on!
Theory
Let’s assume we are dealing with the additive model, that is, consisting of a linear trend and seasonal cycle with the same frequency (width) and amplitude (height). For the multiplicative model, you just need to replace the additions with multiplications and subtractions with divisions.
Trend component
Trend is calculated using a centered moving average of the time series. The moving average is calculated using a window length corresponding to the frequency of the time series. For example, we would use a window of length 12 for monthly data.
Smoothing the series using such a moving average comes together with some disadvantages. First, we are "losing" the first and last few observations of the series. Second, the MA tends to over-smooth the series, which makes it less reactive to sudden changes in the trend or jumps.
Seasonal component
To calculate the seasonal component, we first need to detrend the time series. We do it by subtracting the trend component from the original time series (remember, we divide for the multiplicative variant).
Having done that, we calculate the average values of the detrended series for each seasonal period. In the case of months, we would calculate the average detrended value for each month.
The seasonal component is simply built from the seasonal averages repeated for the length of the entire series Again, this is one of the arguments against using the simple seasonal decomposition – the seasonal component is not allowed to change over time, which can be a very strict and often unrealistic assumption for longer time series.
On a side note, in the additive decomposition the detrended series is centered at zero, as adding zero makes no change to the trend. The same logic is applied in the multiplicative approach, with the difference that it is centered around one. That is because multiplying the trend by one also has no effect on it.
Residuals
The last component is simply what is left after removing (by subtracting or dividing) the trend and seasonal components from the original time series.
That would be all for the theory, let’s code!
Step-by-step tutorial
Setup
As always, we start by importing the libraries.
Data
We will be using probably the most popular dataset for Time Series Analysis – the Australian airline passengers time series. We load it from a CSV file (available [here](https://github.com/mwaskom/seaborn-data/blob/master/flights.csv)), but you can also get it from other sources, for example, from the seaborn
library (here).

Just by eye-balling the plot, it seems like the multiplicative decomposition might be a better choice (especially when looking at the increasing strength of the seasonal component over time). But we will stay in line with what we assumed in the intro and carry out the additive decomposition. We leave the multiplicative one as an optional exercise.
Benchmark from statsmodels
Before decomposing the time series ourselves, we can get the benchmark using statsmodels
.

In the plot we can see another hint that the additive model is not the right choice here -there are clear patterns in the residuals over time. In case of a good fit, we would expect the residuals to behave randomly without any pattern.
For reference, we can easily extract the components from the DecomposeResult
object. They are stored under trend
, seasonal
, and resid
.
Manual decomposition
For the manual decomposition, we will use a pandas
DataFrame to store the original series, the extracted components, and all the intermediate results.
We have already written out the battle plan in the theory section above, so let’s execute all the steps in one code snippet (you can follow the cell-by-cell flow in the Notebook on GitHub).
Running the snippet generates the following table:

A few things worth mentioning about the calculations:
- we have used a rolling window of length 13 (12 months + 1 to make it an odd number for the centered average).
- we used a very handy method called
transform
to calculate the average values per group. We used it to avoid the need of creating a separate DataFrame with the aggregated values and then joining it back to the original DF. You can read more about it (and some other usefulpandas
functionalities) here. - we displayed the first 15 rows so we can see that the aggregated seasonal component is calculated correctly for all the months (including the ones that have missing values for the detrended series).
Lastly, we plot the decomposition.

The results look very similar to what we have obtained using statsmodels
. We did not plot the residuals as points (instead of the default line), however, we should still be able to see the overlapping patterns quite easily.
Results comparison
To compare if we have an exact match, we could look at the components extracted with seasonal_decompose
and compare them to the ones we calculated manually. Spoiler alert: they are very similar, yet not the same.
We opted for another approach, that is, comparing the two decompositions visually. First, we store the results of the manual decomposition in the DecomposeResult
object. Then, we borrowed a great helper function to add the results of a decomposition to an existing decomposition plot (source of the function).

In the figure above, we see that the results are very close. The differences can be attributed to the way that the components are calculated – as always, the devil is in the details. In statsmodels
, the moving average used for extracting the trend component is calculated using a 1-D convolution filter (calculated using the convolution_filter
function). As you can see, the outcome is very similar, but still a bit different. This then propagates to the seasonal component.
Takeaways
- the basic approach to seasonal decomposition splits the time series into three components: trend, seasonal and residuals,
- the trend component is calculated as a centered moving average of the original series,
- the seasonal component is calculated as the per period average of the detrended series,
- the residual component is obtained after removing the trend and seasonal components from the time series.
You can find the code used for this article on my GitHub. Also, any constructive feedback is welcome. You can reach out to me on Twitter or in the comments.
Liked the article? Become a Medium member to continue learning by reading without limits. If you use this link to become a member, you will support me at no extra cost to you. Thanks in advance and see you around!
You might also be interested in one of the following:
Pandas Is Not Enough? A Comprehensive Guide To Alternative Data Wrangling Solutions
A Step-by-Step Guide to Calculating Autocorrelation and Partial Autocorrelation
References
- Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1976) Time Series Analysis, Forecasting and Control. Third Edition. Holden-Day. Series G.