Full Code Template & Walkthrough

LSTM methodology, while introduced in the late 90’s, has only recently become a viable and powerful forecasting technique. Classical forecasting methods like ARIMA and HWES are still popular and powerful but they lack the overall generalizability that memory-based models like LSTM offer.
The main objective of this article is to lead you through building a working LSTM model. However, while this article’s goal isn’t necessarily to compare new and classic modeling techniques, I will discuss some advantages and disadvantages of classical vs. RNN-based techniques in the conclusion.
The full code is provided below. Given you have the dataset and the necessary classes imported, the results should be entirely reproducible.
What Are LSTMs?
LSTM (Long Short-Term Memory) is a Recurrent Neural Network (RNN) based architecture that is widely used in natural language processing and time series forecasting. Brandon Rohrer’s video offers a great, intuitive introduction.
The LSTM rectifies a huge issue that recurrent neural networks suffer from: short-memory. Using a series of ‘gates,’ each with its own RNN, the LSTM manages to keep, forget or ignore data points based on a probabilistic model.
LSTMs also help solve exploding and vanishing gradient problems. In simple terms, these problems are a result of repeated weight adjustments as a neural network trains. With repeated epochs, gradients become larger or smaller, and with each adjustment, it becomes easier for the network’s gradients to compound in either direction. This compounding either makes the gradients way too large or way too small. While exploding and vanishing gradients are huge downsides of using traditional RNN’s, LSTM architecture severely mitigates these issues.
After a prediction is made, it is fed back into the model to predict the next value in the sequence. With each prediction, some error is introduced into the model. To avoid exploding gradients, values are ‘squashed’ via (typically) sigmoid & tanh activation functions prior to gate entrance & output. Below is a diagram of LSTM architecture:

TL;DR – Just Give Me The Code
Execution Script
Methods for Data Manipulation
Methods for Time Series
LSTM Methods
Code & Data Walkthrough – Data Prep
The data is the US/EU exchange rate from 2010 to present, not seasonally adjusted. You can pull the data from the St. Louis Federal Reserve here.

Before building the model, we create a series and check for stationarity. While stationarity is not an explicit assumption of LSTM, it does help immensely in controlling error. A non-stationary series will introduce more error in predictions and force errors to compound faster.
We filter out one ‘sequence length’ of data points for later validation. In this case, 60 points.

The data format required for a LSTM is 3 dimensional, with a moving window.
- So the first data point will be the first 60 days of data.
- The second data point is the first 61 days of data but not including the first.
- The third data point is the first 62 days of data but not including the first and second.

The last major step of prep is to scale the data. Here we use a simple minmax scaler. Our sequence length is 60 days for this part of the code.

Code Walkthrough – Building the LSTM
The next part of the analysis is pretty straightforward code-wise. We first create a learning rate scheduler. This scheduler will monitor the validation loss and modify the learning rate on plateaus.
We create a method to reset all of the weights in case we want to re-train with different parameters (the method is unused in my code but it’s there if you need it).
We then build an LSTM with 2 layers, each with 100 nodes and build an output function with a sigmoid activation.
The history object stores model loss over epoch, which can be plotted to evaluate whether an adjustment is needed in the training process.



Code Walkthrough – Predicting the Future
The class provided allows us to both test model accuracy and to predict the future.
The method below tests our model against the validation data frame we previously set aside. The results are pretty promising.


If we are happy with the accuracy, we can then predict the future. Below is a quick demonstration of how an LSTM predicts. You can see how some error is introduced into the model at each time step, but since each data point is a series of historical points, the error is lessened.



Conclusion
Since this article is mainly about building an LSTM, I didn’t discuss many advantages / disadvantages of using an LSTM over classical methods. I’d like to offer some guidelines in this conclusion:
Non-Technical Considerations
- It is very important to weigh the costs and benefits of using a complex model vs. a simpler model. A slight boost in accuracy may not be worth the time, effort and loss of interpretability introduced by LSTMs.
- If time-series forecasting were easy, stock markets would be solved! There is an inherent element of noise in all time-series data that we cannot feasibly capture, regardless of how great our model is.
- Don’t ignore intuition. Time series models implicitly assume that previous time periods dictate future time periods. This may not always be the case. If your results don’t make sense intuitively, don’t be afraid to ditch the model!
Technical Considerations
- ARIMA (and MA-based models in general) are designed for time series data while RNN-based models are designed for sequence data. Because of this distinction, it’s harder to build RNN-based models out-of-the-box.
- ARIMA models are highly parameterized and due to this, they don’t generalize well. Using a parameterized ARIMA on a new dataset may not return accurate results. RNN-based models are non-parametric and are more generalizable.
- Depending on window size, data, and desired prediction time, LSTM models can be very computationally expensive. Sometimes they’re not feasible without powerful cloud computing.
- It’s good practice to have a ‘no-skill’ model to compare results to. A good start would be to compare the model results to a model predicting only mean for each time step over the period (horizontal line).