A Quick Introduction to Time Series Analysis

Preliminary Details required for Forecasting.

Robby Sneiderman
Towards Data Science

--

Nathan Dumlao via Unsplash

In my first article on Time Series, I hope to introduce the basic ideas and definitions required to understand basic Time Series analysis. We will start with the essential and key mathematical definitions, which are required to implement more advanced models. The information will be introduced in a similar manner as it was in a McGill graduate course on the subject, and following the style of the textbook by Brockwell and Davis.

Introduction:

A ‘Time Series’ is a collection of observations indexed by time. The observations each occur at some time t, where t belongs to the set of allowed times, T.

Figure 1: The general notation used to represent a times series Xt.

Note: T can be discrete in which case we have a discrete time series, or it could be continuous in the case of continuous time series. Sometimes, we refer to one observation of the time series {Xt} as a realisation of the series.

Examples of time series include the DOW Jones, a simple series indicating if it rained each day or not, or a GDP by year series.

A Time Series Model’ for a Time Series {Xt} is a specification of the joint probability distribution of the model (however, often we only consider a model for the mean and first few moments).

Except in special cases, a time series will have a defined and finite mean;

Figure 2: Notation for the mean.

And provided higher level moments exist, the covariance of a time series at time ‘t’ and time ‘s’;

Figure 3: The Covariance of a time series Xt between two times.

The mean of the time series may depend on t, or it may not. Similarly, the covariance between two times may depend on t or not, it will become more clear with examples.

Stationarity:

A Time Series is said to be ‘weakly stationary’ if the following two conditions hold.

  1. The mean value of the time series does not depend on time.

2. The Covariance between any two points the same distance apart is constant. (For example, the covariance between realisations three points away from each other should be constant regardless of t).

Note: it will be useful to recall that variance and covariance can be linearly separated as follows;

Figure 4: Covariance (and variance) can be expanded. This often helps in simplifying calculations.

If a Time Series is weakly stationary, the covariance between any two points will not depend on time. Thus, it will be constant, or only depend on how far away the two points are from each other. Weakly stationary series are much easier to forecast with, thus, much of time series analysis involves trying to reduce a more complicated series to a stationary one.

Note: Strong Stationary requires that the entire distribution function does not change with time, which is a harder requirement to verify and meet. We thus focus on weak stationarity for now.

Autocovariance Function (ACVF):

Figure 5: The ACVF for a stationary time series does not depend on t.

Autocorrelation Function (ACF):

We actually more often work with the autocorrelation function of a time series. In particular, the ACF of a stationary time series is defined as;

Figure 6: The ACF for a stationary time series.

The above two definitions are working with well defined random variables. However, in practice we are working with real data, and so we do not know the theoretical ACVF or ACF. Thus we need to introduce their sample counterparts.

Sample ACVF and ACF:

It is much more common that we are working with data from an unknown distribution (i.e an observed dataset), so it is necessary to define the following: the sample mean, the sample autocovariance function and the sample autocorrelation. These will be used very often, and plotting them gives us an idea about our time series.

Figure 7: The sample mean, sample autocovariance and sample autocorrelation. We can calculate these on observed data.

Fundamental Time Series:

Let us now begin to look at the fundamental theoretical time series that are used to build advanced models.

IID Noise:

One of the simplest examples of a time series is IID noise (independent identically distributed noise). This is a series, {X1,X2,..}, where each realisation Xt is independent and from some identical distribution such as the Normal(0,σ²).

Is IID Noise weakly stationary?

By definition, the expected value at any point is constant (zero). That is the first criteria.

And the covariance between any two points h units away is;

Figure 8: Covariance of IID noise.

Which does not depend on t (it only depends on h), which is the second condition. Thus, IID Noise is weakly stationary. We could also calculate the autocorrelation, which is simply 1 if h is zero, and 0 otherwise.

Random Walk:

Suppose now each Xt is an IID Normal(0,σ ²). Then, the Random Walk time series, {St}, is defined as;

Figure 9: The Random Walk Time Series.

The definition makes it clear why it is called a random walk. The first element of the time series is just a random IID observation. We then add another random observation and repeat.

Figure 10: Animation illustrating the random walk in two dimensions. At each step, we take a random step along a grid. Gif Citation: Creative Commons.

Since each X is an IID Normal, the expected value any time t is equal to 0. Moreover, from the definition of variance;

Var(St)=E(St²)-E(St)²=E((St)²)=σ².

We can thus simplify the covariance between points t and t+h using the linear expansion

Figure 11: The covariance of points h units apart on a random walk. This depends on t.

Which depends on t, the random walk is not weakly stationary.

White Noise:

White noise (sometimes called ‘static’) is similar to IID noise. If Xt is a sequence of uncorrelated zero mean observations with the same variance σ², we say it is White Noise. IID Noise is White Noise, but not all White Noise is IID.

Figure 12: Notation used for White Noise. With specified mean and variance.

What is the difference between IID Noise and White Noise? You may notice the definition of White Noise doesn’t put limitations on the higher order moments, so it doesn’t say anything about E(X10) for example. But, for IID noise all moments are equal. Recall uncorrelated means that E(XtXs)=0 unless t=s.

Extra Facts about White Noise:

  • White Noise has constant power spectral density.
  • A single realisation of White Noise is known as a random shock.

Moving Average Model:

The moving average model is one of the most fundamental time series. We consider the most simple version, the MA(1) model, which can be written as the sum of white noise terms and a real parameter θ.

Figure 13: Example of a MA(1) model. The Z terms are White Noise.

By the linearity of expectation, clearly the expectation of the MA(1) model is zero and thus constant for any t. The covariance between time t and time t+h can be derived; for h=0, this will be the variance.

Figure 14: The covariance for realisation h=0 units apart is equivalent to the variance.

Otherwise, the covariance will be non-zero if and only if t and t+h are only 1 unit apart because;

Figure 15: The MA(1) model covariance will be nonzero for realisation 1 unit apart.

Otherwise, the covariance will be zero, which can be confirmed by writing it out in the form of the covariance above. Thus:

Figure 16: The Autocovariance function for the MA(1) model.

We can also derive the Autocorrelation function, since we know the value of the ACVF at h=0. Mainly, γ(0)=σ²(1+θ²) so the ACF is:

Figure 17: The Autocorrelation function of the MA(1) model.

Because the mean is constantly zero (independent of t) and the covariance is also independent of t, the MA(1) model is weakly stationary.

Note:

  • The MA(q) model is similar, but with the previous q additional terms.

Autoregressive:

The autoregressive model is another fundamental time series that is used as a building block for more advanced series. For Autoregressive models, we make the assumption that {Xt} is weakly stationary. Autoregressive models depend on their previous values, as well as the addition of a scaled uncorrelated (Z and X are uncorrelated) White Noise term. The AR(1) series is the simplest;

Figure 18: The AR(1) model.

To derive the characteristics of the series we note that since we assume {Xt} is stationary, it follows directly that the expected value of the series at any time t is exactly 0.

We can calculate the Autocovariance function for any points h units away;

Figure 19: Since we are assuming weak stationarity, we can drop the additional argument to γ and just write h. We use the assumption that it is equal for all points h units apart to simplify.

By expanding and use the linearity of the covariance function we obtain the simplified form;

Figure 20: The simplified covariance function for realisations h units apart on the AR(1) model.
Figure 21: The variance of the AR(1) model using the assumption of weak stationarity.

Which solving for by bringing the terms over to one side of the equation gives us;

Figure 22: The Variance of a realisation from the AR(1) model.

Since γ(h)=γ(-h) (covariance is symmetric), we can greatly simplify the autocorrelation (ACF):

Figure 23: The Autocorrelation of the AR(1) model for realisations h units apart.

Extra Facts:

  • Just like MA(q) models, AR models can be expanded to an arbitrary number of terms, ie AR(q) models.
  • The AR model is not always stationary, in particular if it contains a unit root. However, we assume for this article the common assumptions that we don’t contain a unit root (ϕ ≠ 1 ).
  • AR models are a special case of VAR models (Vector Autoregressive models).

Moving On:

The moving average model, autoregressive model and White Noise form the basis for most of the actual time series used in practice. For example they are the building blocks of the ARMA and ARIMA models. Now that we have covered some of the theoretical time series, let's move onto time series in practice.

What are some common characteristics of a time series? Let us start with a simulated example to get the idea. Consider the following plot of a time series;

Figure 24: A simulated time series with an upwards trend and seasonal component.

You may notice a few features of the plot;

  • A generally increasing ‘trend’
Figure 25: The same time series with trend indicated by the red line.
  • A repeating or ‘seasonal’ component
Figure 26: Seasonal components are repeating at fixed intervals.

Trend and Seasonality make up a fundamental portion of a time series. Indeed, much of time series analysis and forecasting involves trying to understand the trend and seasonal components of the series. The importance of these two qualities lead to the ‘fundamental decomposition’.

Fundamental Decomposition:

It is useful to think of a time series as consisting of three distinct parts. The Trend, the Seasonality, and the Random Noise.

Figure 27: The Fundamental decomposition of a time series into trend (mt), seasonality (st) and a random component.

Where the expected value of the noise Yt is zero.

Trend: Trend refers to the slope at an area of the time series. For example, we could be trending upwards in general over a certain time period. We could also be trending downwards. Series with trend will generally not be stationary, as the mean changes depending on the time.

Seasonality: This is more than just a trend, it is a repeating pattern, it could be weekly, yearly or at some other fixed interval. Seasonality represents a repeated and clear change in a time series.

  • Fitting seasonality can be done using harmonic regression. This involves, for example, fitting the series with many sines and cosines (a simplification).

In Practice:

Up to now we have studied the mathematical details that cover the perfect time series. In reality, we will almost never have a series that is completely represented by a moving average model or an autoregressive model. These ideal models simply form the skeleton that we will use to fit more advanced models with. We will should demonstrate why the sample ACF is useful and what it can tell us about a timeseries.

Let us work through a few real examples and see what we can learn from them. We will use the R package ‘itsmr’ , which comes preloaded with several datasets.

Australian Red Wine Data:

This dataset (‘wine’ in the itsmr package) consists of 142 monthly observations of red wine sales in Australia (by 1000kL). Lets plot it.

Figure 28: Australian Wine Sales data. Plot made in R using the built in dataset wine. We notice an upwards trend and seasonal component.

Clearly there is a trend, and also a seasonal component. Overall, wine sales are increasing, and seasonally, wine sales increase in the summer and decrease in the winter months. Let's take a look at the sample ACF. R can calculate this automatically using the acf function.

Figure 29: The sample ACF of the Australian Wine sales data up to the first 50 lags.

Note: Slowly decaying sample acf (|ρ(h)|) is indicative of a trend and thus non-stationarity! This makes sense; we saw wine sales were trending upwards.

Periodic sample acf is indicative of seasonality in the time series. This also makes sense as we saw that wine sales soar in summer months and are at a minimum in the winter months.

To forecast with the Australian wine data, we would thus need to account for the trend and seasonality. Trend can usually be accounted for by applying a monotonic transformation such as the log transform. This should help in reducing the trend and make the data closer to a stationary series. Other methods include smoothing and filtering.

The seasonal component also must be fit, usually this can be done using trigonometric functions via harmonic regression.

Summary:

This article introduced the basic mathematical details required to study time series analysis. The Moving Average Model, the Autoregressive Model and White Noise form the fundamental building blocks for more advanced series. We learnt about stationarity, which tells us about how a series changes over time. We also learnt about the important sample statistics you should look at when working with time series such as the sample autocorrelation function.

Finally, we discussed the common decomposition of a time series into a trend, seasonal and random component, and touched on why these are important.

In future articles, I hope to cover in more detail how we can actually forecast time series. The general steps are as follows;

  1. Plot the time series.

2. Determine the trend and seasonal component. Consider transforming variables if needed (such as taking the log transformation).

3. Remove (subtract) the trend and seasonal components to get stationary residuals.

4. Fit the residuals

5. Perform forecasting on the residuals, then obtain forecasts for the original series by transformation

Congratulations, you have now learnt the basics of time series analysis. Once you are familiar with these fundamentals, you are in a position to move onto more advanced topics such as forecasting. In my next articles on Time Series I hope to introduce the ARMA and ARIMA models and discuss Box Jenkins, Holt Winters, Signal processing and Fourier Transforms and the ARCH/GARCH/FGARCH models.

Thank you for reading! Did you enjoy this article or learn something new? If so, please consider checking out my other articles on medium, and consider giving the article a clap or a share. Also, please feel free to leave a comment or correction below.

Sources:

[1] Brockwell and Davis (2002) Introduction to Time Series and Forecasting.

[2] TensorFlow in Practice Specialization (2020)— Sequences, Time Series and Prediction (Coursera via DeepLearning.AI).

[3] Pennsylvania State University STAT 510 Webpage.

[4] R Shumway, D Stoffer (2011) Time Series Analysis and Its Applications. Third Edition.

[5] Fuller (2009) Introduction to Statistical Time Series

--

--