The world’s leading publication for data science, AI, and ML professionals.

Training Time Series Forecasting Models in PyTorch

Lessons learned from training hundreds of PyTorch time series forecasting models in many different domains.

Photo from Unsplash
Photo from Unsplash

Over the past year I’ve used Flow Forecast to train hundreds of Pytorch time series forecasting models on a wide variety of datasets (river flow, COVID-19, solar/wind power, and stock prices). Often beginners come to me looking for info what they should do first. This article is a brief breakdown of some basic tips that you can use when training a time series forecasting model.

Frame your problem

I frequently see a lot of different terms thrown around with respect to time series ML techniques. Here I will attempt to clarify them:

  • Anomaly detection: This is a general technique to detect outliers in time series data. What exactly is an anomaly is a subject of debate itself, however anomalies usually form a very small part of the dataset and are substantially different from other data points. Finally, anomaly detection can be seen as specific extreme form of binary classification although it is usually treated as a separate area. Most anomaly detection methods are unsupervised as we are often unlikely to recognize anomalies until the occur. See this paper for more information.
  • Time Series Classification: Similar to other forms of classification this is where we take a temporal sequence and want to classify it into a number of categories. Unlike anomaly detection we generally have a more balanced number of examples of each class (though it may still be skewed something like 10%, 80%, 10%).
  • Time Series Forecasting: In forecasting we generally want to predict the next value or the next (n) values in a sequence of temporal data. This is what this article will focus on.
  • Time Series Prediction: I don’t like the use of this term as it is ambiguous and could mean many things. Most people though I find usually use it to refer to either forecasting or classification in this context.
  • Time Series Analysis: A general umbrella term that can include all of the above. However, in my mind I usually associate it more with just looking over time series data and comparing different temporal structures than inherently designing a predictive model. For example if you did develop a time series forecasting model than it could possibly tell you more about the casual factors in your time series and enable more time series analysis.

With that said before even getting started you should determine if your problem is actually a forecasting problem as that will guide how you should proceed. Sometimes it might better to cast a forecasting problem as a classification problem. For example, if the exact number forecasted isn’t that important you could bucket it into ranges then use a classification model. Additionally, you should have some understanding of deployment and what the end product will like. If you require millisecond latency for stock trading then a huge transformer model with 20 encoder layers probably won’t function no matter what your test MAE is.

Data Quality/Preprocessing

  • Always scale or normalize data: Scaling or normalizing your data improves performance in 99% of uses cases. Unless you have very small values then this is a step you should always take. Flow Forecast has built in scalers and normalizers that are easy to use. Failure to scale your data can often cause the loss to explode especially when training some transformers.
  • Double check for null, improperly encoded or missing values: I have lost a lot of time and sanity due to data quality issues. Sometimes missing values are encoded in a weird way. For instance, some weather stations encode missing precip values as -9999. This can cause a lot of problems as a regular NA check will not catch this. Flow forecast does provide a module for interpolating missing values and warning about possibly incorrectly entered data.
  • Start with a fewer number of features: In general it is easier to start with fewer features and add more in depending on performance. For instance, when I was forecasting COVID I started with just the mobility data + new cases. As time went on and I got familiar with the general hyper-parameters that worked I added in weather data.

Model Choice and hyper-parameter selection

  • Visualize time lags to determine forecast_history: In Time Series Forecasting pretty much regardless of model we have the number of hist time-steps that we want to pass into the model. This will vary somewhat with architecture as some models are able to better learn long range dependencies. However, finding an initial range is useful. In some cases really long term dependencies might not be useful at all.
  • Start with DA-RNN: I’ve found the DA-RNN model creates a very strong time series baseline. Use transformers can outperform it, but they usually require more data and more careful hyper-parameter tuning. Flow forecast provides an easy to use implementation of DA-RNN.
  • Determining a length to forecast: This is a tricky hyper-parameter to determine what values to search. First to clarify this is the number of time steps your model forecasts at once. You can still generate longer forecasts but you do this by appending the previous forecasts. On the one hand if your goal is predict to a long range of time steps then you may want them directly weighed into the loss function. On the other hand having two many time steps at once can confuse the model. In most of my hyper-parameter sweeps I’ve found a shorter forecast length works well.
  • Start with a low learning rate: I recommend picking a low learning for most time series forecasting models.
  • Adam isn’t always the best: I have found that other optimizers can work better. For instance, BertAdam is good for transformer type models whereas for DA-RNN vanilla can work well.

Robustness

  • Simulate and run play by play analysis of different scenarios. Flow Forecast makes it easy to simulate your model performance under different conditions. For instance, if you are forecasting stream flows you might try inputting really large precipitation values and see how the model responds. Or if you are forecasting
  • Double check heatmaps and other interpretability metrics: Several times I’ve looked at model and thought they are performing well. I have then checked the heatmaps and seen the model isn’t using the important features in forecasting. When I did further testing it became obvious the model was just learning to memorize rather than the actual casual effects of features.

Conclusion

Time Series forecasting is a difficult area to master particularly with DL models. I hope you find these tips useful for how you can improve performance. As always feel free to leave question and comments.


Related Articles