Discovery of Seasonality in Time Series

A Fourier transform analysis using R

Fernando Barbalho
Towards Data Science

--

Photo by Photoholgic on Unsplash

Government revenues and expenditures, traveler flows, and export and import values are time series related to economic phenomena. By their nature, they are subject to trends, breaks in structure, and other elements. These factors can generate false conclusions when trying to find the seasonality that best characterizes such time series. In the following paragraphs, we will analyze a case that, at first, may bring some frustration but, in the end, is an excellent use for one of the most agile techniques for time series analysis: the Fourier transform. For this material, I used the R language to fetch and process the data, do the investigation, and generate the graphs. The link to the code is available at the end of the text. It is the prize for those who read to the end.

Take a look at the series below that describes the 23-year monthly behavior of one of the elements used to determine the operational result of the Brazilian central government. Notice that cycles point to periodic peaks and some dents between these peaks. Notice also that there is a growing trend and then some stabilization.

The whole time series. Image by the author

We could look at the first bet that the most critical seasonality is the annual one determined by the periodic peaks that it is legitimate to believe from the graph that occurs every twelve months throughout the time series. But, on the other hand, a more prudent approach might lead us to think that the dents also occur seasonally, thus marking another periodicity.
See below the values distributed in a cyclical view.

A cycle graph. Image by the author.

As can be seen, the peaks occur in December. The other months do not reveal significant tapering, which leads us to reinforce the hypothesis of annual seasonality. Here it is also worth noting that the color gradation indicates a sequence of years: the closer to pink, the more recent the year. This configuration of colors closer to pink associated with higher values corroborates the growth trend already indicated at the beginning of the text.

Even though the graph gives the impression that the seasonality is annual, it is convenient to look for more powerful analytical tools that can reduce the uncertainty of purely visual analysis. The literature recommends the use of the Fourier Transform to make it possible to identify the most critical frequency in periodic signals. Following the guideline of excluding the trend from the series, I used the R language to identify the most fundamental frequencies and then started all the surprises. See the figure below.

The most crucial seasonality. Image by the author.

The figure above is an adaptation of the most common output from Fourier transform analysis. Usually, on the horizontal axis are the frequencies and on the vertical axis is the spectrum associated with each frequency, indicating the importance of each frequency. Frequency, by definition, is the inverse function of a given period (f=1/T, where f is frequency and T is period). For our case, we are initially more interested in the period that marks the seasonalities, so I have chosen to present this variable on the horizontal axis.

As you can see, the period associated with the highest spectrum corresponds to two-time units, in this case, two months. The most critical seasonality is the bimonthly one and not the annual one. Incidentally, in the same graph above, the 12 months is surpassed by several other seasonalities when looking at the associated spectrum.

But what explains this finding? A detailed breakdown of the algorithm associated with the Fourier Transform can help our understanding. Follow me in the next graphs and paragraphs.

Representing months in a set of frequencies. Image by the author.

The Fourier transform of a time function decomposes signals from the sum of frequencies present in the original time function. In our case, we use the series without trend to check the contribution of 144 different frequencies. All these 144 frequencies contribute to the actual process, but only a tiny amount is significant. Since we are interested in a time series with monthly data, we highlight and name in the figure above the bimonthly, trimonthly, four-monthly, semiannual, and annual frequencies.

The values resulting from a Fourier transform are complex numbers of the type a+bi, where a is the real component and b the imaginary. The graph’s horizontal axis corresponds to the real element, and the vertical axis corresponds to the imaginary part.

Note also that the graph seeks to represent a circle with a radius of 1000. The radius measures the so-called modulus of the complex numbers represented at each white point. We will apply trigonometric functions to this value to get the real and imaginary components.

With these initial explanations, we can better understand the frequencies shown in the picture. Look at the chart of the bi-monthly frequency. We see two points occupying the horizontal or real axis: one is to the left of zero, and the other is to the right. The left point relates to the numbers 1, 3, 5, 7, 9, and 11, and the other to 2, 4, 6, 8, 10, and 12. Each of these numbers refers to one of the months of the year. In this way, the even months are opposed to the odd months.

The point of the odd months is at the 0-degree angle of the circle. For this case, we have that the value on the horizontal axis is 1000 and on the vertical axis is 0. With this, we have the ordered pair (1000,0). In terms of complex numbers, this point corresponds to 1000+0i, which we calculate from trigonometric functions:

1000*cosine(0º) + 1000*sine(0º)i.

Since cosine(0º)=1 and sine(0º)=0, we go back to 1000+0i.

The even months’ point is at degree 180 on the circle. Here the value on the horizontal axis is -1000 and on the vertical axis is 0. In complex number notation, it corresponds to -1000+0i. Using sine and cosine:

1000*cosine(180º) + 1000*sine(180º)i.

Since cosine(180º)=-1 and sine(180º)=0, we have: -1000+0i.

Let us now jump to the annual frequency. Notice that equal angles separate the numbers. Since there are 12 points in a 360° circle, each point is 30° from the other. Thus, the point for month 12 represents a complex number from:

1000*cosine(30º)+1000*sine(30º)i.

That is: 866,02+500i.

One suggestion: try to calculate the other points for the annual frequency and observe the presence of positive and negative values in each month’s real and imaginary components. Another suggestion: check the different frequencies and the points associated with each of them. Observe the pattern that comes from the composition, distribution, and the months’ opposition.

For now, we will stop here in this review. We will need this notation later to calculate the vectors resulting from the combination of the real and imaginary numbers for each point calculated from the analyzed series.

Time series decomposition with Fourier Transform in practice. Image by the author.

The graph above shows how in practice, we decompose the time series with trend exclusion according to bimonthly or annual frequency for each of the 12 months and six randomly chosen years. I highlighted in each of the years the value assumed for December. In year 3, for example, this value is 41874. In year 7, it is 42314, and so on. These values are the moduli of complex numbers. Notice that month 12 is usually far away from the other months. See especially in the year 20 how month 12, December, has a much higher modulus value than the others. Remember that we can represent these points in complex number notation:

Bimonthly frequency

  • Month 12 Year 3:

41874*cosine(180º)+41874*sine(180º) = -41874+0i

  • Month 12 Year 7:

42314*cosine(180º) +42314*sine(180º) = -42314+0i

Yearly frequency

  • Month 12 Year 3:

41874*cosine(30º)+41874*sine(30º) = 36263.95+20937i

  • Month 12 Year 7:

42314*cosine(30º) +42314*sine(30º) = 36645+21157i

It is now time to add up the values of the real and imaginary components separately for each year and, with this, calculate the resulting vectors that will allow us to verify which delivers the greater spectrum, the annual frequency, or the bi-monthly frequency. See what the figure below shows.

Some resultant vectors. Image by the author.

The graph above shows the resultant vector for six randomly chosen years. We calculate the resulting vectors by adding every complex number associated with each month for each frequency. In other words, we add apart all the real and imaginary components for each year for each of the two frequencies analyzed. With this, we have the projection on the horizontal and vertical axes of the modules of the resulting vectors. The projections are represented in the graph by the dotted lines. The whole line with a dot represents the resultant vector at the end that helps to indicate the direction.

We calculate the modulus of the resultant vector by applying the Pythagorean theorem: √( Σreal ²+Σimaginary ²). In year 3, for example, this value was 9832 for the annual frequency and 13002 for the bi-monthly frequency.

Note that over the years, the modulus of the resultant vector for bimonthly frequency is usually more significant than that for annual frequency. Also, notice that at the yearly frequency, the vectors’ directions are in the same quadrant as month 12, i.e., between 0º and 90º. As for the bi-monthly frequency, the vectors align with months 2, 4, 6, 8, 10, and 12, i.e., at 180º.

Now we come to the last step, calculating the final resultant vector for each frequency. Please, see the graph below.

The vector resultant of the whole series. Image by the author.

The figure above shows the total resultant vectors considering the 23 years. Notice that the modulus of the bi-monthly frequency, 464842, is greater than the modulus of the annual frequency, 366136. We confirm that the bimonthly frequency is mathematically associated with a larger spectrum than the yearly frequency.

The figure also allows us to show which bimonthly combination prevails from the direction of the vector. Notice that this direction is pointing to even months. We conclude that the sum of the even months’ values is greater than that of the odd months’ values. One of the possibilities for generating this configuration is that each even month has a higher value than the odd month that precedes it.

The following figure helps to verify this possibility.

Values accumulated by months. Image by the author.

The figure above shows the accumulated values for the whole time series. There are minor differences between an odd month and the even month that follows it. The only exception is for month 12, which is much higher than month 11.

With this last figure, it becomes clear why the bimonthly periodicity represents better the time series analyzed in this text than any other configuration of month arrangements, including the annual one, which was the one that we could bet on at first glance.

Data, code, and contacts

The data comes from the Brazilian National Treasury Secretariat’s open data portal. I used precisely the series “1.3 — Arrecadação Líquida para o RGPS”. The dataset can also be consumed by this R package here. As the data source is licensed using the ODbL license, anyone may use the data for any purpose.

The reader can access the data and code from my GitHub. Consider looking me up on Twitter for further clarification.

Acknowledgments

I thank Fernanda Peixoto Souto, Luis Felipe Coimbra Costa and Milena Auzier for suggestions and feedbacks.

--

--

Doctor in Business Administration from UNB (2014). As data scientist, researches and implements products for transparency in the Brazilian public sector.