There is an opinion that time series forecasting is a complicated task. But let’s not get upset, because there are also advantages – there are a lot of tasks when there are several time series at once, and such tasks are even more difficult! When we start comparing, we understand that it’s not so hard to predict a univariate time series (break for happiness). But what to do when the time series expands with other concurrent sequences of parameters (multidimensional series)? What methods and algorithms to use and what to do if you have the task of predicting such a series but not much experience (spoiler – use Automl, and while it works, fill the gap by reading a couple of articles on the subject).
What is a time series
If we start from the very beginning, it is a sequence of values ordered by time. There is a pattern in time series: the current values of a series are related to the previous ones. If the sequence has no such property, then congratulations (or not), you are dealing with a process which cannot be predicted by classical (and not very) models, in which case you should look at Markov processes.
The simple picture below illustrates the described property – this background is enough to continue reading the post (Figure 1).
If you would like to know more about how to predict time series (especially with AutoML), I have already posted several articles about this topic: AutoML for time series: definitely a good idea; AutoML for time series: advanced approaches with FEDOT framework.
What is a multivariate time series
Getting closer to the main topic. A multidimensional (multivariate) time series is a system of several one-dimensional (univariate) series, where the values of one (target) depend not only on its previous values but also on the previous values of an additional, or several additional series.
An example of what a multivariate time series might look like is shown in Figure 2.
As can be seen from the picture, in order to predict the future states of time series 1 we may use the values of time series 2 and 3. If you are interested in examples of such a series – here, the sea level measured at different points is shown in Figure 3.
Two different cases of "multivariate time series forecasting" are worth highlighting now. In the first one, we will have available data for exogenous time series only for the current time index and historical values, i.e., t, t-1, t-2, t-n, while data for forecast indices t+1, t+2, t+f (where f is forecast horizon), are unknown. Or, in the second scenario, the values on the forecast indices for the target time series are unknown, but there is information about the exogenous ones at these points in time. This can be shown most clearly in the diagram (see Figure 4).
In the first case, we deal with the classical time series forecasting task, because future states are predicted. In the second case, we solve the regression problem, but because the data are ordered in time, we can also call the case "dynamic regression," which takes into account the data’s lag dependencies. Further in the text, we will talk only about the first case, since the second one is a special case of the regression problem.
P.S. If you want to know more about dynamic regressions using time patterns, clap this article 🙂 That way, I will know that this topic is interesting to you and I will prepare a new post related to that field.
How can we predict such series
Getting closer to the main topic. The behavior of a time-series system is predicted using vector autoregression (VAR).VAR is a generalization of the idea of autoregression for multiple time series, so it is considered to be a classic instrument for multivariate series forecasting. Several time series can be included in classical prediction models, such as ARMA. In this case, such models will be called Vector Auto Regressive Moving Average models (VARMA). Recurrent neural networks are also used to solve this task, and they are a highly popular tool in this domain. The use of "more advanced" approaches, for example based on the optimization of the VAR, is usually found in scientific articles by the keywords "multivariate time series forecasting".
Let’s try to consider the approach of generalizing the already known method of predicting univariate time series to the multivariate case. As described in the post AutoML for time series: advanced approaches with FEDOT framework, time series can be predicted using both classical models (AR, ARIMA) and regression machine learning models. In order to use regression models, it is necessary to make a special transformation of the series into a matrix. For the univariate case, the transformation scheme is shown in the picture below (Figure 5).
The following table will be obtained for multivariate time series (Figure 6). The moving window size and forecast horizon are the same as in the previous figure.
The resulting table with features can be used as a training sample for a machine learning model, such as ridge regression or random forest, or for any other suitable algorithm. In this way, the model will generate predictions based on previous values of both the target variable and the exogenous time series. There are two disadvantages to this concept:
- The size of the moving window, which is used for all time series at the same time, may not be optimal. For one of the series, it could be enough to "look into the past" only by 5–10 elements in order to successfully predict the future states, while for the other series, a lag shift will be required by hundreds of elements;
- There is no guarantee that the exogenous time series is correlated with the target one.
It is possible to cope with these problems. And then we will consider how to do it. For now, let’s concentrate on how we can run the AutoML tool to solve the problem of multivariate time series forecasting.
Data description
The demonstration will be done using sea level data from the model NEMO (Nucleus for European Modelling of the Ocean). This model allowed us to prepare a dataset with SSH (sea surface height) measurements. We have collected 25 time series. Their location on the map diagram can be seen in the picture below (Figure 7).
_The dataset is synthetic and was generated by our team, namely by Julia Borisova. Feel free to use it for your purposes under BSD 3-Clause "New" or "Revised" License: SSH dataset can be obtained via link [2]._
The time series we are interested in is highlighted in the picture by a square. It is the one we are going to predict using the historical values of the parameter at this point and using the history of the neighboring ones. The forecast horizon is 50 elements.
The whole dataset (named SSH) as well as the code to run it can be found in the repository pytsbe.
How such rows are predicted in FEDOT
Let’s move on to the code example. So, the task is to use 25 time series to predict the value of one of them (the target) in the future.
To run the example, use the following code (our time series are numpy arrays, and there is an example of launching here, Fedot version for experiments – 0.6.0):
We form a dictionary with time series, which will be used as features in the model.
Few words about the configuration options
If for some reason we do not want to include all of the time series in the model, we can form a dictionary with only those key-value pairs that are needed. So if there are a lot of time series, as a result of which the model is too complex and fit for an obscenely long time, then an acceptable solution may be to prune the feature space.
If there is no need to run the AutoML algorithm, but it is necessary to build at least simple model in a limited time, we can use the parameter predefined_model=’auto’ in the fit method. As you may know, the AutoML core of FEDOT is based on an evolutionary algorithm. And the evolutionary algorithm generates one or more pipelines (initial assumptions) before the optimization starts. These initial assumptions are modified iteratively, and more and more precise solutions are obtained.
So, go back to the predefined_model=’auto’ parameter. The AutoML algorithm will generate an initial assumption. However, the optimization process will not start. The algorithm will only train this one auto-generated initial assumption and return fitted pipeline. Doing so will allow us to quickly prepare the model without running the AutoML algorithm. That is, we use a predefined configuration of the pipeline (in this case, automatically generated by the algorithm) and train only it and nothing else. Naturally, this process requires much less time than running the AutoML core. An example of how this should look like in the code is shown below:
Let’s go back to evolution (in this experiment we don’t use predefined_model): after a certain number of generations, the following result was obtained (Figure 8).
It looks good. Let’s try to figure out what’s going on internally.
Disclaimer about visualizations
We have recently updated the visualization tools, so the display of graphs (pipelines) may differ from the pictures in this post when using the FEDOT framework version 0.6.0 or later.
How such rows are predicted by FEDOT
Let’s consider the process of searching for a solution. First, the algorithm needs something to start the evolution process with – some valid initial assumptions. Such initial assumptions are generated automatically, and the following pipelines are obtained – check Figure 9.
What happens here? – First, time series are passed to each "data source" node (1). These nodes do not do anything with their time series, but pass it further.
This was done in order to be able to connect new data sources to the pipeline. It also helps to understand which time series are transformed in the pipelines (note the names after the slash, they show the labels of the time series, so we can identify them). Thus, it can be clear that a lagged transformation was applied to the time series named 0 and a copy of it was transferred to the GLM model (2). GLM is a generalized linear model that can work directly with the time series and does not require lagged transformation. Then the output from the lagged transformation is passed to the ridge regression model (3).
Then predictions for each time series from two models (GLM and ridge) are transferred to the ridge regression model (4), and the final model in this case is ridge regression too. Don’t worry about the large number of ridge regression models – this is just an initial assumption, further evolution will select more optimal models in the nodes.
Then, based on this initial assumption, an initial population of pipelines for the evolutionary algorithm is generated via the use of mutations. In the process of optimizing the structure of AutoML pipelines, the algorithm can change operations (models) in nodes, remove nodes and edges, add new ones, and tune the hyperparameters of operations. It means that it becomes possible to obtain the following structures (Figure 10).
As can be seen from Figure 10, if necessary, we can remove nodes and entire branches. Thus, we can get rid of time series that are not related to the target.
During graph optimization, we can tune the moving window size for each of the series separately (regulated by the hyperparameter window size in the lagged transformation). It solves the problem of the presence of unrepresentative exogenous predictor series in the model.
What’s all this "agony" for?
Obviously, to reduce forecasting error 🙂 But let’s be serious, the question is really good – it is time to check whether the implemented approach gives at least some advantages. To make sure that the use of exogenous time series really affects the error of the final forecast, let’s conduct an experiment.
We will predict the time series using the initial assumption automatically generated by the algorithm. In this case, we will use the historical values of the target time series as predictors, and then we will iteratively expand the number of additional time series. For the last 50 elements of the target time series, we will compare predicted and actual values. Metric: mean absolute error (MAE). The results of this test are shown in the animation below:
Figure 11 shows the dependence of the error on the validation sample from the number of series.
As can be seen from the figure, the number of time series included in the model affects the error value. NB: In this experiment we did not tune the hyperparameters of the pipelines and we did not modify the initial structure using the evolutionary algorithm. When a model uses numerous time series, there are more opportunities to improve the model compared to a simple model. Therefore, potentially, the advantage of the more complex model may increase after the hyperparameter tuning and other optimization procedures.
Conclusion
In this post, we considered what approaches there are to predicting Multivariate Time Series. We discussed how you can predict such a series with AutoML. And then we looked in detail at how everything is designed in the AutoML tool FEDOT for multivariate TS forecasting.
Forecast time series, use FEDOT!
Useful links:
- The repository with the open-source AutoML framework FEDOT
- Open-source tool for launching Time Series Forecasting algorithms and benchmarking – pytsbe (the module is pretty new, so if you want to take part in its open-source development – get involved!)
- Our new paper about the AutoML approach for time series forecasting (was presented on Congress on Evolutionary Computation) – Evolutionary Automated Machine Learning for Multi-Scale Decomposition and Forecasting of Sensor Time Series
Datasets used in this post (& licenses):
-
Copernicus "Sea level daily gridded data from satellite observations". Ref. to Taburet, G., Sanchez-Roman, A., Ballarotta, M., Pujol, M.I., Legeais, J.F., Fournier, F., Faugere, Y., Dibarboure, G.: Duacs dt2018: 25 years of reprocessed sea level altimetry products – 2019. Link to the dataset – Copernicus Dataset License – Licence Agreement Links are up to date at 2nd of November 2022
- Sea surface height (SSH) synthetic dataset. Link to the dataset – SSH dataset License – BSD 3-Clause "New" or "Revised" License Links are up to date at 2nd of November 2022
An explanation of how multivariate time series can be predicted was prepared by me (Mikhail Sarafanov) and the NSS lab team