Hands-On Tutorials

Functional Time Series

When we measure data more frequently, how can we best analyze it? Functional Data Analysis (FDA) can simplify the analysis drastically.

Florian Heinrichs
Towards Data Science
11 min readJan 5, 2021

--

Photo by Rostyslav Savchyn on Unsplash

As memory space grows, storing data becomes cheaper and cheaper, which in turn means that more and more data is stored. In case of time series, this means that data is collected more frequently.

It is not clear, however, how to model time series that are recorded with a (very) high frequency, especially when (multiple) seasonality is involved.

Shall we consider the data as a univariate time series or as a multivariate time series with high dimension? In many cases it is best to view the observations as functions of time and analyse this functional time series itself. For example, we can divide intraday prices of stocks into daily observations, where we observe for every day the price as function of time of day. (Sounds technical, but we’ll see an example later!)

This latter approach is particularly useful when the functional data is continuous because it implies more structure than vectors in higher dimensions (if x is close to y, f(x) is close to f(y) for a continuous function; x(i) does not need to be close to x(i+1) for a vector (x(1), …, x(d))).

Functional data arises naturally in different settings such as medicine (EEG data), finance (stock prices) or meteorology (temperature).

In this blog post, we go through examples and try to develop an intuitive understanding of functional time series. Additionally, hypothesis tests for the assumptions of independence and stationarity of functional time series are introduced.

Note: Throughout it is assumed that you are familiar with basic concepts of time series analysis. If you want to refresh your knowledge or get started with the topic, check my previous blog post.

Introduction

Let’s start with an example. Imagine that we measured the temperature in a specific location over time and have collected one observation per day, so for year t and day i we have observation X(t,i), where i ∈ {1, …, d}, t ∈ {1, …, n} and d = 365 (number of days per year), as in Figure 1.

Figure 1: Daily temperature in Sydney from 2013 to 2017; x-axis: time, y-axis: temperature in Celsius; image by author

Now we have different options how to approach the data. First, we could regard it as a univariate time series and simply concatenate the data in chronological order, so technically Y(t) = X(j,i) where t = (j-1) d + i. In this case, we have seasonality, which makes the analysis more difficult.

We could also model the time series as a multivariate time series with as many dimensions as observations per year, such that every observation of the time series corresponds to the data collected during the entire year: Y(t) = ( X(t,1), …, X(t,d) ). Now we don’t have to take seasonality into account, but the dimension is very high (365 dimensions to be precise). Of course we can reduce the dimension by reducing the frequency of the observations. However, in this case, we lose information and it is not clear how to chose the frequency (weekly or monthly?).

A final approach is to consider the data as a functional time series Y(t,x), where we have a function Y(.,x) for every year t with Y(t,i/d) = X(t,i). In this case, the yearly temperature is viewed as a function in time and every observation corresponds to a function, which describes the yearly temperature. In Figure 2 is the data from Figure 1 viewed as functional time series.

Figure 2: Temperature in Sydney for different years; x-axis: time, y-axis: temperature in Celsius; image by author

A functional perspective on the data has important benefits. In the example, the mean of the temperature throughout the year is clearly not constant. However, the mean temperature in summer is above the mean temperature in winter. So, a non-stationary univariate time series might be stationary when modelled as functional time series, because we compare data that is reasonable to compare (e.g. comparing temperature in January 2016 with the temperature in January 2017 instead of July 2016).

Further, modelling the data as a functional time series is often more natural than using a high-dimensional time series because it adds an additional structure. For example, if we observe continuous functions, such as temperature or stock prices, the values for two close time instants are similar, whereas such a structure is not given for arbitrary multivariate time series.

This is similar in spirit to convolutional neural networks. Due to their specific architecture, the use of CNNs is more restricted to a specific set of problems, yet for this set of problems, they work very well.

Functional data analysis (FDA) is an active area of research and can be used in various applications. In the following, we focus on a specific type of functional data, namely functional time series. Note however, that many classic results from statistics were generalized to functional data, such as t-tests to compare expected values of different groups.

Functional Time Series — Basics

Okay, so I mentioned functional time series, we saw an example, but what is a functional time series mathematically? And how is it different from a univariate time series?

Mathematically, there is only a small difference. A univariate real time series is a collection of real data indexed by time (see here). So for time instants 1, 2, …, n we observe real-valued data Y(1), Y(2), …, Y(n), such as the temperature at a specific location or the price of a certain share. A functional time series is basically the same, but we observe functions instead of real-valued data. In this case Y(1), Y(2), …, Y(n) are functions and might be written as functions in x, i.e. Y(1)(x), Y(2)(x), …, Y(n)(x). For simplicity, we often assume that the functions are defined on the interval [0,1] and rescale the interval if necessary (if f(x) is defined on an interval [0, T], g(x)=f(xT) is defined on the interval [0, 1].

Technical Note: We work in a functional space rather than in the space of the real numbers and it is not clear how quantities such as the expected value or the covariance are defined for functional data. Luckily, the mean and covariance can be defined pointwise in most cases, such that the identities E[Y(i)](x) = E[Y(i)(x)] and Cov(Y(i)(x), Y(j)(z)) = Cov(Y(i),Y(j))(x,z) hold.
Depending on the assumptions such as continuity of the observations or L²-integrability, we might have additional structure (the space of continuous functions is a Banach space, the L²-space is a Hilbert space) and in these cases the pointwise definition can be justified.

Tests on Stationarity and Independence

As for univariate time series, the concepts of stationarity and independence drastically simplify any further analysis, so we want to know if they are reasonable assumptions. On a conceptual level, the ideas are the same as in the univariate case: Two functional observations are independent if the probability factorizes and a time series is stationary if its distribution remains constant over time (see this blog post for a rigorous definition).

In this blog post, we have seen how to validate the two assumptions for univariate time series. For functional data, we can do the very same thing, namely use the CUSUM statistic to determine whether a time series is (weakly) stationary and a Portmanteau-type test to validate (or reject) the null hypothesis of independence.

Testing for weak stationarity

When working with time series, we want to know if they are stationary, because in this case we do not need to take temporal changes into account. However, stationarity is difficult to measure and we often use the time series’ moments as proxy. Intuitively, if the moments do not change over time, we can neglect temporal changes of the underlying distribution. Thus, instead of testing for stationarity, we want to test whether the mean and (auto-)covariances of the given time series are time-invariant. For a functional time series Y(1)(x), Y(2)(x), …, Y(n)(x), this translates into the hypotheses

for some i ∈ { 2, … , n } and

for some i ∈ { 2, … , n-h}.

The time series Y(i) is weakly stationary if the null hypotheses are valid for any positive integer h. Note that the first- and second-order moments of Y(i) are functions themselves. So equality of two functions depends on the function space. In the space of continuous functions, for example, two functions f and g are equal if they are equal in any point x, so if f(x)=g(x). Contrarily, two functions are equal in the space of square-integrable functions if they coincide in almost every point (w.r.t. the Lebesgue measure).

Instead of testing the latter hypotheses for all lags h, we often restrict our attention to the first H lags (so for all h with 1 ≤ h ≤ H), as those fundamentally determine the behavior of the distribution. In the following, we will only test for H₀, as we can test the null hypotheses concerning the second-order moments analogously. Further, we assume the data to be square-integrable, so it belongs to the space L²([0,1]) with norm denoted by ||.||. In this case, the testing problem in (1) is equivalent to

As in the univariate scenario, we can employ the CUSUM statistic, which basically compares the average of the first with the average of the remaining observations. The (functional) CUSUM statistic is defined as

Under the null hypothesis (and weak assumptions), √n C(u, x) converges weakly to a centered Gaussian process B(u, x) with unknown covariance function in the space L²([0,1]²) with norm ||.||₂. Contrarily, √n C(u, x) deviates to +∞ or -∞ under the alternative. So if √n C(u, x) deviates too much from its limit B(u, x), we can reject H₀.

Unfortunately, we don’t know the distribution of B(u, x) as we don’t know the covariance and need to estimate it. There are different ways how to do so and one common approach in time series analysis is to use a block multiplier bootstrap approximation, which is basically a resampling scheme that takes temporal dependence into account.

If q(α) denotes the α quantile of ||B||₂ (obtained, for example, through a bootstrap procedure or direct estimation of the covariance), we can reject H₀ whenever √n ||C||₂ > q(1-α). This defines an asymptotic consistent level α-test for H₀.

A Portmanteau-type test

Similarly to stationarity, stochastic independence is difficult to measure and we use the time series’ (auto-)covariance structure as a proxy to assess its degree of dependence. For simplicity, we assume that the time series is stationary and centered, that is E[Y(i)]=0 (this hypothesis can be tested analogously to the presented procedure). Again, we are mainly interested in autocovariances with small lags h. In line with the classic Portmanteau test, we consider the hypotheses

As before, the second order moments of a functional time series are functions themselves, so we formulate the hypotheses in terms of their norms.

In order to avoid the multiple testing problem, we compare all moments simultaneously instead of testing them individually by considering their maximum.

As test statistic, we can use (the maximum of) the empiric moments

The estimator Mₙ(h) converges in probability to the E[Y(1) Y(1+h)], the autocovariance at lag h. Thus, we can reject the null hypothesis, if Mₙ(h) deviates significantly from its limit.

Under the null hypothesis, it holds that √n ||Mₙ(h)||₂ converges weakly to ||B(h)||₂ for some centered Gaussian variable B(h), and it diverges to infinity under the alternative.

Again, the covariance structure of B(h) is unknown and its distribution can be approximated in terms of a bootstrap procedure as in the case of stationarity testing.

If q(α) denotes the (approximated) α quantile of max {||B(h)||₂: 1 ≤ h ≤ H}, we can reject H₀ whenever

which defines an asymptotic consistent level α-test for H₀.

Implementation in Python

For the implementation, we use climate data from Australia. More specifically, the daily minimum temperature in Sydney (station number 066062) from 1859 to 2017 provided by the Bureau of Meteorology of the Australian Government.

First, we need to load the needed packages and prepare the data:

Testing for stationarity of the mean

In order to test for stationarity of the mean, we define three auxiliary functions to calculate the cusum statistic, the L²-norm and bootstrap replicates to approximate the quantile.

Test Statistic: 2.11
Approximated quantile: 1.84
The null hypothesis can be rejected

The output suggests that we can reject the null hypothesis of a constant mean function. Thus, it is unlikely that the temperature was stationary in Sydney from 1859 to 2017, which suggests a change in climate.

Testing for uncorrelatedness

For the Portmanteau-type test as introduced earlier, we define two auxiliary functions. The first function calculates products of the (functional) observations, which are used later to calculate empirical moments more efficiently. The second function generates bootstrap replicates to approximate the quantile.

Note that the calculations can take a moment because the quantile approximation is computationally expensive.

Note further that we assumed for simplicity that the time series is centered and stationary. Both assumptions are clearly not met in the given example. By subtracting an estimated (local) mean, we can generalize the methodology to non-centered time series, but it will be non-stationary as suggested by the previous test. We can use the data anyway to illustrate the methodology.

Test Statistic: 2587.612
Approximated quantile: 67.463
The null hypothesis can be rejected

The result suggests that the null hypothesis of uncorrelatedness can be rejected. As mentioned before, this result is not interpretable as it is neither reasonble to assume centeredness nor stationarity of the time series.

Conclusion

Functional data analysis is more technical than the analysis of univariate data, but has some important advantages and can be used in many applications. On a conceptual level, both approaches are similar and we can use (almost) the same ideas and techniques.

Beyond functional time series, FDA has many other important applications — it can even be used for dimensionality reduction which seems counterintuitive at first — and is a topic that we should be aware of as data scientists.

--

--

PhD in mathematical statistics, researcher in time series analysis, data scientist in industry and python enthusiast — linkedin.com/in/florian-heinrichs