Exploring the LSTM Neural Network Model for Time Series

Practical, straightforward implementation with the scalecast library

Published in

Towards Data Science

10 min readJan 13, 2022

One of the most advanced models out there to forecast time series is the Long Short-Term Memory (LSTM) Neural Network. According to Korstanje in his book, Advanced Forecasting with Python:

“The LSTM cell adds long-term memory in an even more performant way because it allows even more parameters to be learned. This makes it the most powerful [Recurrent Neural Network] to do forecasting, especially when you have a longer-term trend in your data. LSTMs are one of the state-of-the-art models for forecasting at the moment,” (2021).

That’s the good news. The bad news is, and you know this if you have worked with the concept in TensorFlow, designing and implementing a useful LSTM model is not always straightforward. There are many excellent tutorials online, but most of them don’t take you from point A (reading in a dataset) to point Z (extracting useful, appropriately scaled, future forecasted points from the completed model). A lot of tutorials I’ve seen stop after displaying a loss plot from the training process, proving the model’s accuracy. That is useful, and anyone who offers their wisdom to this subject has my gratitude, but it’s not complete.

There is another way.

GitHub - mikekeith52/scalecast: Forecast dynamically at scale with this unique package. pip install…

This package uses a scaleable forecasting approach in Python with common scikit-learn and statsmodels, as well as…

github.com

The scalecast library hosts a TensorFlow LSTM that can easily be employed for time series forecasting tasks. The package was designed to take a lot of the headache out of implementing time series forecasts. It employs TensorFlow under-the-hood. Here are some reasons you should try it out:

Easy to implement and view results with most data pre- and post-processing performed behind the scenes, including scaling, un-scaling, and evaluating confidence intervals
Testing the model is automatic — the model fits once on training data then again on the full time series dataset (this helps prevent overfitting and gives a fair benchmark to compare many approaches)
Validating and viewing loss during each training epoch on validation data, similar to TensforFlow, is possible and easy
Benchmarking against other modeling concepts, including Facebook Prophet and Scikit-learn models, is possible and easy

There are also some reasons you might stay away:

Because all models are fit twice, training an already-sophisticated model can be twice as slow
You do not have access to all the tools to intervene in the model that working with TensorFlow directly would offer
With a lesser-known package, you never know what unforeseen errors and issues may arise

Hopefully that gives you enough to decide whether reading on will be worth your time. With that out of the way, let’s get into a tutorial, which you can find in notebook form here.

Data Preprocessing

First, we install the library:

pip install scalecast --upgrade

You will also need tensorflow (for Windows) or tensorflow-macos (for MAC).

pip install tensorflow

pip install tensorflow-macos

Next, let’s import the library and read in the data (which is available on Kaggle with an Open Database license):

import pandas as pd
import numpy as np
import pickle
import seaborn as sns
import matplotlib.pyplot as plt
from scalecast.Forecaster import Forecaster

df = pd.read_csv('AirPassengers.csv',parse_dates=['Month'])

This set captures 12 years of monthly air passenger data for an airline. It starts in January 1949 and ends December of 1960. It is a good example dataset for forecasting because it has a clear trend and seasonal patterns. Let’s take a look at it visually:

Exploratory Data Analysis

To begin forecasting with scalecast, we must first call the Forecaster object with the y and current_dates parameters specified, like so:

>>> f = Forecaster(
>>>    y=data['#Passengers'],
>>>    current_dates=data['Month']
>>> )
>>> f

Forecaster(
    DateStartActuals=1949-02-01T00:00:00.000000000
    DateEndActuals=1960-12-01T00:00:00.000000000
    Freq=MS
    ForecastLength=0
    Xvars=[]
    Differenced=0
    TestLength=1
    ValidationLength=1
    ValidationMetric=rmse
    CILevel=0.95
    BootstrapSamples=100
)

Let’s decompose this time series by viewing the PACF (Partial Auto Correlation Function) plot, which measures how much the y variable, in our case, air passengers, is correlated to past values of itself and how far back a statistically significant correlation exists. The PACF plot is different from the ACF plot in that PACF controls for correlation between past terms. It is good to view both, and both are called in the notebook I created for this post, but only the PACF will be displayed here.

f.plot_pacf(lags=26)
plt.show()

From this plot, it looks like a statistically significant correlation may exist up to two years in the data. That will be good information to use when modeling. Let’s further decompose the series into its trend, seasonal, and residual parts:

f.seasonal_decompose().plot()
plt.show()

We see a clear linear trend and strong seasonality in this data. The residuals appear to be following a pattern too, although it’s not clear what kind (hence, why they are residuals).

Finally, let’s test the series’ stationarity.

>>> stat, pval, _, _, _, _ = f.adf_test(full_res=True)
>>> stat
0.8153688792060569
>>> pval
0.9918802434376411

Checking a series’ stationarity is important because most time series methods do not model non-stationary data effectively. “Non-stationary” is a term that means the trend in the data is not mean-reverting — it continues steadily upwards or downwards throughout the series’ timespan. In our case, the trend is pretty clearly non-stationary as it is increasing upward year-after-year, but the results of the Augmented Dickey-Fuller test give statistical justification to what our eyes see. Since the p-value is not less than 0.05, we must assume the series is non-stationary.

All of this preamble can seem redundant at times, but it is a good exercise to explore the data thoroughly before attempting to model it. In this post, I’ve cut down the exploration phases to a minimum but I would feel negligent if I didn’t do at least this much.

LSTM Forecasting

To model anything in scalecast, we need to complete the following three basic steps:

Specify a test length — The test length is a discrete number of the last observations in the full time series. You can pass a percentage or a discrete number to the set_test_length function. In more recent scalecast versions, testing can be skipped by setting a test length of 0.
Generate future dates — The number of dates you generate in this step will determine how long all models will be forecast out.
Choose an estimator — we will be using the “lstm” estimator.

To accomplish these steps, see the below code:

f.set_test_length(12)       # 1. 12 observations to test the results
f.generate_future_dates(12) # 2. 12 future points to forecast
f.set_estimator('lstm')     # 3. LSTM neural network

Now, to call an LSTM forecast. By default, this model will be run with a single input layer of 8 size, Adam optimizer, tanh activation, a single lagged dependent-variable value to train with, a learning rate of 0.001, and no dropout. All data is scaled going into the model with a min-max scaler and un-scaled coming out. Anything you can pass to the fit() method in TensorFlow, you can also pass to the scalecast manual_forecast() method.

f.manual_forecast(call_me='lstm_default')
f.plot_test_set(ci=True)

Predictably, this model did not perform well. But just the fact we were able to obtain results that easily is a huge start. Fine-tuning it to produce something useful should not be too difficult.

Let’s start simple and just give it more lags to predict with. We saw a significant autocorrelation of 24 months in the PACF, so let’s use that:

f.manual_forecast(call_me='lstm_24lags',lags=24)
f.plot_test_set(ci=True)

Already, we see some noticeable improvements, but this is still not even close to ready. An obvious next step might be to give it more time to train. In this universe, more time means more epochs. Let’s see where five epochs gets us. We also validate the model while it’s training by specifying validation_split=.2 below:

f.manual_forecast(
    call_me='lstm_24lags_5epochs',
    lags=24,
    epochs=5,
    validation_split=.2,
    shuffle=True,
)
f.plot_test_set(ci=True)

Again, closer. A couple values even fall within the 95% confidence interval this time. Next, let’s try increasing the number of layers in the network to 3, increasing epochs to 25, but monitoring the validation loss value and telling the model to quit after more than 5 iterations in which that doesn’t improve. This is known as early stopping.

from tensorflow.keras.callbacks import EarlyStopping

f.manual_forecast(
    call_me='lstm_24lags_earlystop_3layers',
    lags=24,
    epochs=25,
    validation_split=.2,
    shuffle=True,
    callbacks=EarlyStopping(
        monitor='val_loss',               
        patience=5,
    ),
    lstm_layer_sizes=(16,16,16),
    dropout=(0,0,0),
)

f.plot_test_set(ci=True)

Again, slow improvement. By now, you may be getting tired of seeing all this modeling process laid out like this. Just find me a model that works! So, I’m going to skip ahead to the best model I was able to find using this approach. See the code:

f.manual_forecast(
    call_me='lstm_best',
    lags=36,
    batch_size=32,
    epochs=15,
    validation_split=.2,
    shuffle=True,
    activation='tanh',
    optimizer='Adam',
    learning_rate=0.001,
    lstm_layer_sizes=(72,)*4,
    dropout=(0,)*4,
    plot_loss=True
)
f.plot_test_set(order_by='LevelTestSetMAPE',models='top_2',ci=True)

That took a long time to come around to, longer than I’d like to admit, but finally we have something that is somewhat decent. All but two of the actual points fall within the model’s 95% confidence intervals. It only has trouble predicting the highest points of the seasonal peak. It is now a model we could think about employing in the real world.

MLR Forecasting and Model Benchmarking

Now that we finally found an acceptable LSTM model, let’s benchmark it against a simple model, the simplest model, Multiple Linear Regression (MLR), to see just how much time we wasted.

To switch from an LSTM to an MLR model in scalecast, we need to follow these steps:

Choose the MLR estimator — just like how we previously chose the LSTM estimator.
Add regressors to the model —in LSTM, we only used the series’ own history and let the model parameterize itself. With MLR, we can still use the series’ own history, but we can also add information about which month, quarter, or year any given observation falls into to capture seasonality and a time trend (among other options). We could even ingest a dataframe of our own regressors (not shown here).
Difference non-stationary data — this is how we mitigate the results of the Augmented Dickey-Fuller test showing we had non-stationary data. We could have done this with LSTM as well, but we were hoping it was sophisticated enough to work without this step.

This is all accomplished in the code below:

from scalecast.SeriesTransformer import SeriesTransformer

transformer = SeriesTransformer(f)
f = transformer.DiffTransform()

f.add_ar_terms(24)
f.add_seasonal_regressors('month','quarter',dummy=True)
f.add_seasonal_regressors('year')
f.add_time_trend()

Now, we run the forecast and view test-set performance of the MLR against the best LSTM model:

f.set_estimator('mlr')
f.manual_forecast()

f = transformer.DiffRevert(
    exclude_models = [m for m in f.history if m != 'mlr']
) # exclude all lstm models from the revert

f.plot_test_set(order_by='TestSetMAPE',models=['lstm_best','mlr'])
plt.title('Top-2 Models Test-set Performance - Level Data',size=16)
plt.show()

Absolutely incredible. With the simplest model available to us, we quickly built something that out-performs the state-of-the-art model by a mile. Maybe you could find something using the LSTM model that is better than what I found— if so, leave a comment and share your code please. But I’ve forecasted enough time series to know that it would be difficult to outpace the simple linear model in this case. Maybe, because of the dataset’s small size, the LSTM model was never appropriate to begin with.

We can then see our models’ predictions on future data:

f.plot(
    models=['mlr','lstm_best'],
    order_by='LevelTestSetMAPE',
    level=True,
)

We can also see the error and accuracy metrics from all models on out-of-sample test data:

f.export('model_summaries',determine_best_by='LevelTestSetMAPE')[
    ['ModelNickname',
     'LevelTestSetMAPE',
     'LevelTestSetRMSE',
     'LevelTestSetR2',
     'best_model']
]

The scalecast package uses a dynamic forecasting and testing method that propagates AR/lagged values with its own predictions, so there is no data leakage.

Conclusion

I hope you enjoyed this quick overview of how to model with LSTM in scalecast. Hopefully you learned something. My takeaway is that it is not always prudent to move immediately to the most advanced method for any given problem. The simpler models are often better, faster, and more interpretable.