How to Develop Interpretable Time Series Forecasts with Deep Learning

A concise and thorough summary of NeuralProphet.

Michael Berk
Towards Data Science

--

Time series forecasting sucks. It’s cumbersome and requires both subject matter and technical expertise. That is, until now.

time series forecasting prophet facebook prophet neural prophet neuralprophet forecasting deep learning data science machine learning time series forecast
Figure 1: NeuralProphet quick start forecast. Image by author.

In 2020, researchers at Standford and Facebook retooled the Prophet algorithm to include a deep learning component. The main selling point is that accuracy improvements were between 55–92%. The deep learning portion of the model is built on top of PyTorch, so they’re easily extendable. Run time on average increased by about 4x, but time series forecasts are rarely in real time, so run time isn’t a major issue.

If you need an interpretable yet powerful time series forecast, NeuralProphet might be your best option. Here’s an implementation example.

Let’s dive in.

Technical TLDR

NeuralProphet is a deep learning extension of the original Prophet library. The GAM-structure of the model remains unchanged and we simply include several deep learning terms. Those terms are for lagged covariates, future (forecasted) covariates, and autoregression. There are three neural network configurations with increasing complexity described below.

But, what’s actually going on?

Ok, let’s slow down a bit. We’re going to start from square one and assume you don’t know anything about Facebook Prophet.

1 — What is Facebook Prophet?

The initial Facebook Prophet algorithm (2017) is very lightweight yet effective time series forecasting model. It was built to be easy to use and interpretable, descriptions that are rarely associated with time series modeling.

According to the original paper, the model succeeded because the researchers reframed time series forecasting as a curve-fitting problem instead of an autoregressive one. Many prior models, such as ARIMA, lagged and fit data instead of trying to find the functional form of our trend.

time series forecasting prophet facebook prophet neural prophet neuralprophet forecasting deep learning data science machine learning time series forecast
Figure 2: initial Facebook Prophet algorithm terms. Image by author.

The model has three main components as shown in figure 2. T(t) corresponds to the trend of our time series after seasonality has been removed. S(t) corresponds to our seasonality, whether it be weekly, monthly, or yearly. And finally, E(t) corresponds to pre-specified events and holidays.

There is a fitting process for each of these components and, once fit, they often combine to produce a reliable forecast.

For a more visual representation of these components, here’s decomposition plot from Prophet’s documentation.

time series forecasting prophet facebook prophet neural prophet neuralprophet forecasting deep learning data science machine learning time series forecast
Figure 3: trend (top), events (middle), and weekly seasonality (bottom) for Peyton Manning touchdown count data — src. Image by author

Now that we have some foundation on the predecessor of NeuralProphet, let’s move on.

2 — How does NeuralProphet work?

NeuralProphet adds three components to our original framework, as shown on the second line of figure 4.

time series forecasting prophet facebook prophet neural prophet neuralprophet forecasting deep learning data science machine learning time series forecast
Figure 4: Neural Prophet algorithm terms. Image by author.

The first three terms remain mostly unchanged between the two models. The final three are deep learning terms that differentiate the new model from the old. Let’s take a look at each one in turn.

2.1 — Trend T(t)

Trend is unchanged from the prior Prophet model. In short, we look to model trend using either exponential or linear growth functions. Below we take a look at exponential growth (figure 5):

time series forecasting prophet facebook prophet neural prophet neuralprophet forecasting deep learning data science machine learning time series forecast
Figure 5: exponential growth equation used in Prophet. C is the carrying capacity, k is the growth rate, and m is the offset parameter. Image by author.

Using logistic growth is a very traditional and well-accepted solution, however where the original Prophet model innovates is that it allows for the parameters of the function to change. These change points are dynamically determined by the model and give a lot more freedom to other static growth rate and offset parameters.

2.2 — Seasonality S(t)

time series forecasting prophet facebook prophet neural prophet neuralprophet forecasting deep learning data science machine learning time series forecast
Figure 6: a component plot of yearly seasonality— src. Image by author.

Seasonality is defined to be variations that occur at specific regular intervals. It’s notoriously difficult to account for because it can take so many forms.

The initial developers of the model came up with another great idea — instead of trying to model seasonality with autoregression i.e. lagging data, they tried to model seasonality’s curve. And that’s where Fourier Series come in.

A Fourier Series is just a bunch of sinusoids summed together, which can be used to fit any curve. Once we have the functional form of our data’s daily, weekly, monthly, etc. seasonality, we can simply add those terms to our model and accurately forecast future seasonality.

2.3 — Events E(t)

For the final term that was in the initial Prophet model was used to handle events.

Seasonality and events are handled nearly identically — with Fourier Series. However, instead of a smooth curve, we expect our Fourier transform to produce a very spiky curve given a specific holiday. And, because the underlying functions are sinusoidal, they are easily extended into the future.

Now let’s move on to the new model.

2.4 — Regressors F(t), L(t)

One of the powerful aspects of the Prophet and NeuralProphet models is that they allow for covariates. Most time series forecasting models are univariate, although they do sometimes offer multivariate versions — ARIMA vs MARIMA.

When handling covariates with time series forecasts, we need to ensure that those covariates will be present n time periods in advance, or else our model has nothing to forecast with. We can do this by lagging current covariates by n time periods, which is modeled by the L(t) term, or developing a forecast for those covariates, which is modeled by the F(t) term.

Once we have our respective covariates, we then throw deep learning at it (section 3).

2.5— Auto-Regression A(t)

Finally, autoregression is the concept of looking back on prior values and using those as predictors for future values. The original prophet model was so effective because it steered away from this philosophy, however to leverage deep learning we must return.

The autoregression term uses lagged values to predict future values. In practice we rarely use covariates so this is where most of Neural Prophet’s power comes from.

With that structure, let’s zoom in on the deep learning models used by NeuralProphet.

3 — Deep Learning Model(s)

NeuralProphet is built on top of PyTorch and AR-Net, so its modules are easily customizable and extendable.

There are a few configurations. The first is Linear AR, which is just a single layer neural network (NN) with no biases or activation functions. It’s very lightweight and regresses a particular lag onto a particular forecast step, which makes interpreting the model quite easy.

Deep AR is a fully connected NN with a specified number of hidden layers and ReLU activation functions. With the increase in complexity between Linear AR and Deep AR, there is greater train time and you lose interpretability. However, often you see improvements in forecasting accuracy. It’s also important to note that you can approximate the information from the weights in Linear AR with the sums of the absolute weights of the first layer for each input position. It’s not perfect, but it’s better than nothing.

Sparse AR is an extension of deep AR. For the autoregressive piece, it’s often best to have an AR of high order (more values at prior time steps) because we can add a regularization term. By adding more data and automatically removing its importance during fitting, we are more likely to find signal.

Any of the above three methods can be implemented with both covariates and auto-regressed values.

Summary

And there you have it, NeuralProphet it all it’s glory!

To hammer home the concepts, we’re going to quickly summarize.

NeuralProphet is a deep learning extension of Facebook Prophet. It adds on to the prior model by including deep learning terms on both covariates and data in the time series.

The initial model (Prophet) leverages curve-fitting, which was a novel approach to time series forecasting. It provided unparalleled out-of-the-box performance and interpretability, however we needed more modeling power. NeuralProphet adds deep learning terms to Prophet, which are governed by three neural net configurations. NeuralProphet significantly improves model fitting capacity, but slows down performance and reduces explainability.

If Facebook Prophet isn’t cutting it, try NeuralProphet.

Thanks for reading! I’ll be writing 25 more posts that bring academic research to the DS industry. Check out my comment for links to the main source for this post and some useful resources.

--

--