
The rise of Generative AI and Large Language Models (LLMs) has fascinated the entire world initializing a revolution in various fields. While the primary focus of this kind of technology has been on text sequences, further attention is now being given to expanding their capabilities to handle and process data formats beyond just text inputs.
Like in most AI areas, time series forecasting is also not immune to the advent of LLMs, but this may be a good deal for all. Time series modeling is known to be more like an art, where results are highly dependent on prior domain knowledge and adequate tuning. On the contrary, LLMs are appreciated for being task-agnostic, holding enormous potential in using their knowledge to solve variegated tasks coming from different domains. From the union of these two areas, the new frontier of time series forecasting models can be born which in the future will be able to achieve previously unthinkable results.
![[Image by the author]](https://towardsdatascience.com/wp-content/uploads/2024/07/18yrsEulbJD8oCjTUI1JV_g-1.png)
The adoption of deep learning in time series forecasting is not new. As the presentation of the paper "Attention is All You Need", transformers began to find applications in various domains, including time series forecasting. Since LLM are transformer-based architectures, one of the main challenges of these networks consists of overriding the lack of semantics in the numerical data sequences. This is usually not a serious concern for semantic-based applications such as NLP (the semantic meaning of a sentence is largely preserved even if we reorder some words in it) while being a trouble for time series and representing a great challenge for future LLM development.
In this post, we don’t aim to revolutionize the time series forecasting ecosystems or propose a revolutionary generative approach. Instead, we take a step back and try to understand how to carry out zero-shot time-series forecasting with standard machine learning models, aware of the fact that simple methods tend often to outperform more complex approaches when doing time series forecasting.
When thinking about time series forecasting benchmarking the first things that come to mind are the Makridakis competitions (M1 [1], M3 [2], M4 [3]). The Makridakis competitions aim to evaluate various forecasting methods using real-world data sets, demonstrating their effectiveness in real-world scenarios. From the first one, launched in 1982, classic statistical time series forecasting methods tend to dominate. Over time, new editions were held and new forecasting approaches came to the fore, like hybrid and ensemble methodologies.
Gradient Boosting models, especially, exhibit their superiority over other approaches. Their fast training ability, low data pre-processing, and explicative power make them the swiss-knife for Data Science in time series forecasting. At the same time linear baseline and statistical methods often provide a great alternative or good candidates for an ensembling strategy.
TRANSFER LEARNING with TIME SERIES
Our scope is to perform forecasting on unseen time series using a forecasting model trained on a different data source without the need for retraining or adaptations. This means that time series coming from different domains can share some common features. This technique is also known as Transfer Learning generally referring to the ability of models to adapt to new tasks (predict a new time series in our scenario) without the need for further training.
This approach has proven remarkably effective in various areas, including Computer Vision and Natural Language Processing (NLP). It emerges as a game changer when data is scarce or when the time required to develop a model from scratch is excessively long. However, care must be taken to ensure similarity between distributions of the datasets. If the training domain is significantly different, then it may be harmful to the predictive ability of the model, compared to when modeling the target with specific approaches.
Let’s see Transfer Learning in action…
As a first step, we download and store the M3 and M4 datasets as-is:
from datasetsforecast.m3 import M3
from datasetsforecast.m4 import M4
df_m4, _, _ = M4.load(directory='./', group='Monthly')
df_m3, _, _ = M3.load(directory='./', group='Monthly')
df_m4['unique_id'] = df_m4['unique_id'].factorize()[0]
df_m3['unique_id'] = df_m3['unique_id'].factorize()[0]
Makridakis competitions were organized in collaboration with the International Institute of Forecasters, which makes the data of the competitions freely available for anyone to use for research/divulgation purposes without further permission (as stated here).
Then we instantiate a recursive forecasting model instance, choosing any model as the base estimator, and compute a first training on the whole M4 datasets with a global approach. With the global forecasting approach, we are referring to a single predictive model built considering all the provided time series simultaneously to achieve a better generalization catching the common patterns underlying the system (other than better resources management and maintainability).
import lightgbm as lgb
from tspiral.forecasting import ForecastingCascade
model = ForecastingCascade(
lgb.LGBMRegressor(random_state=42, n_jobs=-1, verbose=0),
lags=range(1,12+1),
groups=[0],
target_standardize=True,
).fit(df_m4[['unique_id']], df_m4['y'])
In the end, we operate the evaluation of our trained model on the M3 dataset splitting it into different folds and making forecasts using the previously trained model on M4 (general model). At the same time, we also fit the same model at each fold of the M3 and use it to make predictions on the corresponding test portions (specific model).
![[Image by the author]](https://towardsdatascience.com/wp-content/uploads/2024/07/1DkUJBsta12UeEN24jcve3w-1.png)
from sklearn.base import clone
from tspiral.model_selection import TemporalSplit
CV = TemporalSplit(4, test_size=12, gap=12)
preds = []
for i, (train_id, test_id) in enumerate(
CV.split(df_m3['y'], None, df_m3['unique_id'])
):
preds.append(
df_m3.iloc[test_id].assign(
cv_fold = i+1,
model_specific = clone(model).fit(
df_m3[['unique_id']].iloc[train_id],
df_m3['y'].iloc[train_id]
).predict(
X=df_m3[['unique_id']].iloc[test_id],
last_X=df_m3[['unique_id']].iloc[train_id],
last_y=df_m3['y'].iloc[train_id],
),
model_general = model.predict(
X=df_m3[['unique_id']].iloc[test_id],
last_X=df_m3[['unique_id']].iloc[train_id],
last_y=df_m3['y'].iloc[train_id],
)
)
preds = pd.concat(preds, axis=0, ignore_index=True).dropna()
RESULTS
At this point, we only need to calculate the forecasting errors. The average Root Mean Squared Error on the test folds is chosen as the error measure.
![Forecasting errors on test data from various approaches. Forecasting errors of also dummy methods are reported as a measure of goodness. [Image by the author]](https://towardsdatascience.com/wp-content/uploads/2024/07/10h5ynZQGM00Q-M_a_YL7Cg-1.png)
![Forecasting errors on test data from various approaches. Forecasting errors of also dummy methods are reported as a measure of goodness. [Image by the author]](https://towardsdatascience.com/wp-content/uploads/2024/07/1LSyqqQW5Yh0WgtazqimYbw-1.png)
Training a LGBMRegressor and a _SGDRegressor_ on M4 and computing inference directly on M3 (general model) results in better performance than making inferences with the same models trained on M3 (specific model).
SUMMARY
In this post, we proposed a practical application of zero-shot forecasting on a real-world dataset. Applying a simple Transfer Learning approach, we demonstrated how to use classical machine learning models to make forecasts without further training. As the approach results successfully on the selected dataset, it’s not guaranteed it may work anywhere. Applying zero-shot predictions can be difficult, but having a large amount of free datasets available nowadays can be a good approach to try. Especially with the rise of LLMs for time series forecasting, remembering how the same results can be achieved without further effort and with a classical approach is a valuable consideration.
REFERENCES
[1] Makridakis, Spyros, et al. (1979). Accuracy of Forecasting: An Empirical Investigation. Journal of the Royal Statistical Society. Series A (General), vol. 142, no. 2, pp. 97–145. https://doi.org/10.2307/2345077.
[2] Makridakis, S., & Hibon, M. (2000). The M3-Competition: Results, conclusions and implications. International Journal of Forecasting, 16(4), 451–476. https://doi.org/10.1016/S0169-2070(00)00057-1.
[3] Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2019). The M4 Competition: 100,000 time series and 61 forecasting methods. International Journal of Forecasting, 36(1), 54–74. https://doi.org/10.1016/j.ijforecast.2019.04.014.
Keep in touch: Linkedin