The world’s leading publication for data science, AI, and ML professionals.

The best book to start learning about time series forecasting

And to make it even better – it's free!

Photo by Dollar Gill on Unsplash
Photo by Dollar Gill on Unsplash

Time series forecasting is a very interesting and challenging area of statistics/Machine Learning. Arguably, it can be a tougher problem to take on than a standard regression task, given we have to take into account the temporal dependencies and additional assumptions enforced by them.

While it can be a daunting problem to solve, it is also a crucial one for pretty much every company, as they all deal with some kind of forecasting – sales, demand, revenue, growth, to name just a few. And this is exactly where knowledge of this field of data science can make a significant impact and provide value-added for the decision-makers.

In this article, I give a short overview and a personal opinion about the book that I think is a great starting place for learning about time series forecasting. And that does not mean that it’s only a book for beginners, as I am pretty sure even experienced data scientists will learn something new from the lecture.

The book I have in mind is Forecasting: principles and practice by Rob J Hyndman and George Athanasopoulos. The book is available for free here, but you can also buy a paperback or digital version (via the links on the book’s website). I have previously read the 2nd edition of the book, but I was pleasantly surprised that the 3rd edition was released in February 2021, so I updated the article to include the latest information.

Source: https://otexts.com/fpp2/
Source: https://otexts.com/fpp2/

As I mentioned, the book is a great place to start for people who would like to work on forecasting problems – professionally or personally. The audience can either already have some general understanding of the domain or is starting from scratch. To make the best use of the book, basic background in maths (algebra) would definitely be helpful, as well as knowledge of R (programming language).

The latter is definitely nice to have while going over the book, as it comes with a dedicated R library (fpp3) and also extensively uses tsibble and fable packages (instead of the forecast package as in the 2nd version of the book). Thanks to that, you can quickly replicate all the examples from the book or use the provided datasets to carry out analyses of your own.

Chapters’ overview

In this section, I will shortly go over the contents of the chapters and mention what, in my opinion, is most relevant in them. You can consider it a curated table of contents.

Chapter 1: Getting started

In this chapter, the authors provide an introduction to the domain of time series forecasting by defining basic terminology and drawing up the general steps one needs to follow while approaching tasks related to time series. Additionally, they provide a few use-cases of the techniques described in the book.

Chapter 2: Time series graphics

The second chapter contains a selection of very useful types of plots used for time series analysis, together with their interpretation and the conclusions we can draw from them. To give a few examples, the chapter covers time-series plots (line plot), seasonal plots, scatterplots, plots of the autocorrelation function (ACF), and more. In one of my articles, I showed how to quickly recreate a few of those plots in Python.

Chapter 3: Time series decomposition

This section is crucial for anyone working with time series. In it, the authors describe what time series decomposition is, what are the seasonal/trend/remainder components and which methods we can use to decompose a time series (classical, X11, SEATS, STL). Then, they present their respective strengths and weaknesses we should have in mind while decomposing time series.

Chapter 4: Time series features

This chapter is focused on the automatic extraction of useful features from the underlying time series. The authors describe the feasts R package, which includes many functions for computing FEatures And Statistics from Time Series. By using the package, we can easily extract features based on the autocorrelation coefficients (for example, the sum of squares of the first X coefficients from the original/differenced series) or the strength of trend/seasonality (based on the STL decomposition).

Chapter 5: The forecaster’s toolbox

In this chapter, the authors provide us with a basic set of tools that will be very helpful in most of the time series problems. They show how to:

  • prepare and transform the data for modeling,
  • use simple (naïve) models as benchmarks,
  • understand the concepts of the fitted values and residuals,
  • diagnose our models by looking at the distribution and autocorrelation of the residuals,
  • use a selection of popular evaluation metrics for point forecasts (such as MAE, MAPE, RMSE) and understand their strengths and weaknesses,
  • evaluate distributional forecasts using metrics such as the Quantile Score, the Winkler Score, the Continuous Ranked Probability Score (CRPS), or the skill scores,
  • employ cross-validation while working with time-series problems.

Chapter 6: Judgemental forecasts

Judgemental forecasts (or adjustments to the statistical forecasts) are made manually by experts using their domain knowledge. Sometimes, we either cannot rely on statistical methods for forecasting (for example, due to lack of data) or would like to augment the statistical ones with domain knowledge. The latter can be especially helpful, for example, in the case when the experts know that something will change in the near future (e.g. a new policy or law), but its effect is not yet present in the past data.

The authors present a few approaches to judgemental forecasting, for example, the Delphi method or scenario forecasting. Additionally, they provide best practices on how to efficiently and correctly implement such forecasts in practice, while avoiding potential bias.

Chapter 7: Time series regression models

In this chapter, the authors explain the simplest, yet very powerful model used for Time Series Forecasting – the linear regression model. They cover topics such as:

  • the general model specification and the basic equation (for curious readers, the matrix form is also introduced),
  • the assumptions of the model and how to test them,
  • popular approaches to feature engineering,
  • evaluating the regression models using different criteria (adjusted R², AIC, BIC) and selecting the best features using the stepwise approach,
  • popular approaches to non-linear regression,
  • the dangers of confusing correlation and causation and why we should beware of multi-collinearity.

Chapter 8: Exponential smoothing

In the eighth chapter, the authors introduce the entire class of exponential smoothing models. They start with the classic models such as the Simple Exponential Smoothing, Holt’s linear trend model and Holt-Winter’s seasonal model. Having done so, they go a step further to explain how state space models can be incorporated into the exponential smoothing framework. Using them, we are actually able to not only obtain point forecasts but also quantify the uncertainty and get the prediction intervals.

Chapter 9: ARIMA models

Having described the exponential smoothing models, the authors proceed to the second of the two most popular approaches to time series forecasting – the ARIMA class models. In contrast to the ES models which rely on the description of the trend and seasonality in the data, the ARIMA models try to describe the auto-correlations present in the time series.

That is why in this chapter the authors focused on the concept of stationarity and differencing.They also introduced the building blocks of the ARIMA – the autoregressive and moving average models. While manually selecting the hyperparamers of the ARIMA models can be quite tricky, the authors introduced an algorithm used for automatically selecting the best combination – the famous auto-ARIMA. It is definitely worthwhile to spend some time on this part, in order to understand how the algorithm traverses the possible hyperparameter space in search for the best fit.

Lastly, the authors covered the more advanced variant of the model, which is able to account for seasonality in the time series – the seasonal ARIMA (SARIMA).

Chapter 10: Dynamic regression models

In the 10th chapter, the authors introduce another type of model— the dynamic regression models. To understand this kind of model intuitively, we can think of it as a combination of two other models mentioned in the book. The first one would be an ARIMA model, which predicts the future observations using the past data. The second one is a general regression model, which can contain all the other information relevant for the predictions, for example, the effect of holidays, overall changes in the economy, etc.

Another useful technique we can learn about is dynamic harmonic regression, in which we model the (longer) seasonal pattern using Fourier terms, while the short-term time-series dynamics are handled by an ARMA-type error.

Chapter 11: Forecasting hierarchical or grouped time series

This chapter introduces the concept of aggregated time series. Let’s say you have a time series of total sales in an online store. The most common case for hierarchical time series is when you would disaggregate the total series per geographical region. You might split the total sales by country and – if there is such a need – you can then further break the series down by region, and so on. In other words, there is a clear, hierarchical manner of disaggregating the series step by step. And while forecasting such a series, it is important that the lower levels do add up as you move up the hierarchy – so the sum of all the regional sales adds up to the country level, etc.

For grouped time series, there is no unique hierarchical manner of disaggregating the time series. Following the example of the online store, you might split the total time series by product category and price range. In such a case, both hierarchies can be used together for disaggregating the series. This leads to a slightly more complicated scenario than the hierarchical one.

After introducing the two kinds of aggregated time series, the authors go over the most popular approaches of forecasting such series – the bottom-up, top-down, and middle-out approaches.

Chapter 12: Advanced forecasting methods

After describing most of the basics that are useful for time series forecasting, the authors then present a few advanced approaches:

  • modeling complex seasonality (with multiple seasonal periods) – for example, daily data may have both a weekly and annual patterns. To tackle this challenge, we might want to use the STL decomposition with multiple seasonal periods, dynamic harmonic regression, or the TBATS model (Exponential smoothing state space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components – yes, this is the full name of the model 😀 ).
  • Facebook’s Prophet model.
  • modeling directional relationships between multiple variables using vector autoregressions (VAR, not to be confused with the Value-at-Risk).
  • a brief introduction to deep learning and neural network autoregression (modeled using feed-forward NNs). What is especially interesting is the additional explanation on how to obtain confidence intervals for such a model.
  • bootstrapping and bagging – the authors describe how to use block bootstrap, what is the idea behind bagged ETS forecasts, and their trade-off between the accuracy and estimation time.

Chapter 13: Some practical forecasting issues

In the very last chapter, the authors describe a selection of practical issues we might encounter while working with time series:

  • challenges and solutions to working with weekly and sub-daily data,
  • how to make sure the forecasts stay within certain bounds,
  • how to use averaging of different models described in the book to improve the overall accuracy,
  • what are the potential pitfalls of working with either very long or very short time series,
  • some ways of dealing with missing values (depending on the cause of why the data is missing) and outliers (simple ways of detection and cleaning the data).

Opinion

I have already praised the book a lot, but I honestly believe that the praise is well-deserved. What I really like about the book is its practical approach. If you are in a hurry or just want to play around with your own data, you can pretty much copy-paste the selected snippets and have an MVP of the model in a matter of minutes.

At the same time, you can use the book for getting a more in-depth understanding of the domain. The authors were merciful enough to keep our brains from exploding from an overload of mathematical equations, so they only use formulas for conveying the essential information, while referring to other sources for more detailed derivations and deeper dives into the nitty-gritty details. This way, I do believe that the book is more accessible to a wider audience. For me personally, it also serves as a point of reference when I would like to quickly review some particular topic. As I do know that it is written exhaustively, yet concisely and straight to the point.

Another great feature of the book is that it is continuously updated with the latest advancements in the field and new technologies. For example, the 3rd edition released this year added the description of the Prophet algorithm and a few more recent developments.

Lastly, the book comes together with dedicated R packages containing all the methods described in the book (and more!). So you can also treat the book as an extended instruction on how to use a certain function from them. The authors also aligned the contents of the book (and the packages) with the tidy framework and the tidyverse set of packages.

Conclusions

I do hope that this article did encourage you to read Forecasting: principles and practice. It is definitely a great resource for getting practical knowledge about time series forecasting, and I would recommend it not only as a starting point but also as a point of reference while working on a project involving time series.

As always, any constructive feedback is welcome. You can reach out to me on Twitter or in the comments.

If you liked this article, you might also be interested in one of the following:

Facebook’s Prophet + Deep Learning = NeuralProphet

Choosing the correct error metric: MAPE vs. sMAPE

5 free tools that increase my productivity

References

  • Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3.

Related Articles