The world’s leading publication for data science, AI, and ML professionals.

AutoML for time series: advanced approaches with FEDOT framework

An example of using FEDOT and other AutoML libraries on real-world data with gaps and non-stationarity

Thoughts and Theory

AutoML framework FEDOT for time series forecasting (image by author)
AutoML framework FEDOT for time series forecasting (image by author)

As we already noticed in our previous post, that most of the modern open-source Automl frameworks do not cover time series forecasting tasks extensively. In that post, we have made a preliminary demonstration of what forecasts the AutoML approach can produce.

However, we can go deeper into one of the AutoML frameworks FEDOT which can automate machine learning pipeline design for time series forecasting. Thus, we will explain in detail what is going on in the core of FEDOT through the real-world tasks of time series forecasting.

Fedot framework and time series forecasting

Earlier we have talked about pipelines for machine learning problems. A pipeline is an acyclic directed graph. In FEDOT terms, this graph is called a chain, or composite model, or pipeline.

The basic abstractions that FEDOT operates are:

  • An operation is an action that is performed on the data: it can be an action to preprocess the data (normalize, standardize, fill in the gaps) or a machine learning model that gives predictions;
  • A node is a container in which an operation is placed. There can only be one operation in one node. A primary node accepts only the raw data, and a secondary node uses the output from the previous level nodes as predictors;
  • A chain or pipeline is an acyclic directed graph that consists of nodes. Machine learning pipelines in FEDOT are implemented through Chain class.

The given abstractions can be seen in the figure below:

Operations, nodes, and chains (pipelines) in the FEDOT framework (image by author)
Operations, nodes, and chains (pipelines) in the FEDOT framework (image by author)

Both machine learning models and classical models, such as autoregression (AR) for time series, can be inserted into the structure of such a pipeline.

Well, we know how to solve classification or regression problems. And we even know how to make a pipeline of models in FEDOT. But how do we get to time series prediction? And how can we use, for example, a decision tree? Where are the features?

The features are right here! To build a table with features, you only need to walk through the time series with a sliding window and prepare a trajectory matrix.

It is worth saying that this is not our invention: you can read about the SSA method, which uses this transformation. This approach is also used in one version of the H2O library. The application of almost all machine learning models for time series is to construct such matrices.

Let’s analyze this method of series transformation in more detail. A time series is a sequence of values where subsequent values usually depend on previous ones. So, we can use the current and previous elements of a time series to make a prediction. Let’s imagine that we want to predict the series one element ahead, using the current and one previous value:

Example of making a table with features for time series prediction (image by author)
Example of making a table with features for time series prediction (image by author)

We call such a transformation "lagged-transformation" of time series. In FEDOT we put it in a separate operation "lagged". It’s important hyperparameter is the size of the sliding window, which determines how many of the previous values we will use as predictors.

Below there is an animation with an example of multi-step forecasting one element ahead. However, prediction in one step can be performed for several elements ahead at once. In this case, the multi-target regression problem is solved. You can see the whole prediction process from forming the trajectory matrix (or lagged table) to making a prediction:

Animation. Predicting 3 elements ahead with lagged time series transformation (animation by author)
Animation. Predicting 3 elements ahead with lagged time series transformation (animation by author)

Any machine learning model can be used as a predictive model. But we have also implemented several specific models for Time Series Forecasting in FEDOT (such as AR and ARIMA). Also, time-series specific preprocessing methods, like moving average smoothing or Gaussian smoothing were added.

There is no automatic machine learning here yet. The framework "comes to life" when its intelligent part, composer, is launched. Composer is the interface for making pipelines. Within it, it uses an optimization method, which implements the "automatic" part of AutoML. By default, the framework uses an evolutionary approach based on the principles of genetic programming. However, if necessary, any search algorithm can be added to the composer, from random search to Bayesian optimization.

AutoML works in two stages:

  • Composing is the process of finding the structure of the pipeline. By default, the evolutionary algorithm is used for that purpose. At this stage, the operations in the nodes are changed, the subtrees are removed from some solutions, and "grow" to others. Hyperparameters of operations in nodes are also mutated here;
  • Hyperparameters tuning is a process in which the pipeline’s structure is constant, but the hyperparameters in the nodes are changing. This stage starts after the composing has finished.

Below is an example of the mutations transformations that are performed on the pipeline during composing stage:

Animation. The process of mutations in the pipeline during composing. Various mutation operators are shown that change hyperparameters in nodes, replace operations, add nodes. Crossover operators are not shown (animation by author).
Animation. The process of mutations in the pipeline during composing. Various mutation operators are shown that change hyperparameters in nodes, replace operations, add nodes. Crossover operators are not shown (animation by author).

During evolution, the most accurate models are selected. So, at the end of composing there will be a pipeline with a fixed structure and we need only to configure hyperparameters in nodes.

The hyperparameters are tuned simultaneously in all nodes of the pipeline using optimization methods from hyperopt library:

Animation. The process of parameters tuning in the composite model nodes (animation by author)
Animation. The process of parameters tuning in the composite model nodes (animation by author)

After completing all the stages, we will get the final pipeline.

Data we have

In machine learning (non-scientific) articles it is widespread to use relatively simple time series for demonstration of algorithms effectiveness. One of the most popular is "US airline passengers" and below is the plot, which shows how it looks like:

US airline passengers dataset (image by author)
US airline passengers dataset (image by author)

It is very tempting to demonstrate the library’s capabilities on such time series – however, most of the somewhat complex models will be able to provide an adequate forecast. We have decided to take a dataset from the real-world – to show all the capabilities of AutoML algorithms. We hope, this example will be good enough for that demonstration.

There are two time series: the first one is average daily electricity generation from a wind farm. The second one is the average daily electricity generation obtained from a diesel generator. Both of these parameters are measured in kilowatt-hour.

Electric power generation obtained from diesel and a wind power generator (image by author)
Electric power generation obtained from diesel and a wind power generator (image by author)

The electricity generation of wind generators is highly dependent on wind speed. And if the wind speed decreases, the diesel generator is turned on to sustain the generation at a sufficient level. So, when the power output of a wind turbine falls, it rises on a diesel generator, and vice versa. It is also worth noting that the time series has gaps.

The programming code will not be listed below in this post. However, for better perception, we have prepared a large number of visualizations. The full version of the programming code, where all the technical aspects are described in much more detail, is located in the jupyter notebooks.

Task

The task is to build a model which will forecast the diesel electric power generation 14 days ahead.

Gap-filling

The first problem that appears is the presence of gaps in the raw time series. In FEDOT for time series gap-filling three groups of methods are available:

  • Simple methods such as linear interpolation;
  • Iterative forecasting methods using single time series forecasting models;
  • Advanced forecasting schemes for filling in gaps.

Methods from the first group work fast but with a low precision. The methods from the second group do not take into account the specifics of the problem and are equivalent to simply predicting a time series. The last group of methods takes into account the drawbacks of the previous ones. So we will apply methods from the third group further. The composite model uses a bi-directional time series forecast to fill in the gaps.

Example of a combined forecast, where two models are used, and the result of their forecast is combined using a weighted average (image by author)
Example of a combined forecast, where two models are used, and the result of their forecast is combined using a weighted average (image by author)

In order to fill in the gaps in time series, we create a simple pipeline of Gaussian smoothing, lagged transformation, and ridge regression. Then we train this pipeline to make forecasts ahead to the "future".

The structure of the obtained pipeline for restoring gaps in the time series (image by author)
The structure of the obtained pipeline for restoring gaps in the time series (image by author)

Then repeat this action in the opposite direction – train the pipeline to predict the "past". Afterward, we can combine two forecasts, using averaging.

The sequence of actions in this approach can be described as follows. First, the part of the time series located to the left of the gap is used. A composite model is trained on this part to give a forecast for the number of elements ahead as there are in the gap. After that, the procedure is repeated for the right part. In order to do this, invert the known part of the time series, train the model, make a forecast and invert the resulting forecast. The combination of forecasts is carried out using a weighted average. Thus, the vector whose values are closer to the known part of the time series from which the forecast was made will have the greatest weight. That is, when averaging, the red forecast (in the figure) will have more weight in the left part of the pass, and the green one will have more weight in the right part.

After applying gap-filling algorithm, we get such result:

The filled gap in the time series of power generation obtained from diesel generator (image by author)
The filled gap in the time series of power generation obtained from diesel generator (image by author)

Quite good, isn’t it? But the second time series still has a gap in the central part of it. We can apply the previous approach for this gap too, but another approach exists. We match the values of two time series using pair regression and restore the values of the wind turbine power generation (target) using a series with a diesel generator as a single predictor. We will also solve this regression problem using the FEDOT framework.

After all these gap-filling procedures we have got the following result:

Restored time series (it can be seen that they go in the opposite phase - and the filled-in pass does not violate this principle) - (image by author)
Restored time series (it can be seen that they go in the opposite phase – and the filled-in pass does not violate this principle) – (image by author)

Now both time series have no gaps and are ready for further use.

Forecast

Let’s use all FEDOT features described above and run the AutoML algorithm on our data. We have launched FEDOT with default configuration for time series forecasting just using fit and predict methods from API. Now let’s look at the resulting forecast and calculate the metrics: mean absolute error (MAE) and the root of the mean square error (RMSE): MAE – 100.52, and RMSE – 120.42.

Example of a time series forecast (image by author)
Example of a time series forecast (image by author)

If we look at the plot and the values of the metrics, the question arises: is the model good or not?

Answer: it is difficult to figure it out. It is better not to validate the model on one small sample – there are only 14 values there. It is better to calculate the metric at least several times. For instance, three times by 14 (that is, 42). To do this, you should use in-sample forecasting.

Advanced validation

Below is an animation that should help you understand the difference between out-of-sample and in-sample forecasting:

Animation. In-sample and out-of-sample forecasting process (animation by author)
Animation. In-sample and out-of-sample forecasting process (animation by author)

So, our model can make a forecast for 14 values ahead. But we want to get a forecast for 28 values ahead – in this case, we can iteratively make a forecast for 14 elements two times. In this case, the values predicted in the first iteration (out-of-sample) will serve as predictors for the second forecast.

If we want to validate the model, we will use in-sample forecasting. With this approach, we predict the already known part of the time series (the test sample). However, in iterative forecasting, the known values are used to form predictors for the next step, rather than the predicted values at the previous step.

In FEDOT, this approach is also implemented – so now we will test the algorithm on three blocks of 14 values each. To do this, we will divide the sample and run the composer again. The prediction result is shown in the figure below. It is important to clarify that the evolutionary algorithms are stochastic, so the output from the AutoML model may differ.

Validation of a composite model for a time series on three blocks of 14 elements. The right side of the original time series is shown (image by author)
Validation of a composite model for a time series on three blocks of 14 elements. The right side of the original time series is shown (image by author)

The forecast on the first validation block perfectly repeated the actual values of the time series. This seems strange, but everything becomes clear as soon as we look at the structure of the obtained pipeline.

Examples of the pipelines obtained during the composing (the process of evolution). There were considered both a pipeline with time-series-specific preprocessing operations and simple pipelines that represent linear relationships (image by author)
Examples of the pipelines obtained during the composing (the process of evolution). There were considered both a pipeline with time-series-specific preprocessing operations and simple pipelines that represent linear relationships (image by author)

As it can be seen from the figure, more complex pipelines do not always provide the lowest error metrics. So, the best found pipeline turned out to be short, but nevertheless, the error value on the validation was small. Based on this, we conclude that this is enough for this time series.

Since the final model is a K-nearest neighbor algorithm, the pipeline is well able to repeat the time series patterns from the training sample. Problems with such a model may arise, for example, with time series are not stationary according to the trend. In this case, the K-nearest neighbor model will not be able to adequately extrapolate the dependencies from the training sample. This time series has another feature – it is non-stationary in variance.

However, its structure contains relatively homogeneous parts that are not much different from the part of the time series on which the validation was performed.

Homogeneous parts of the time series that are "similar" to the validation section are highlighted in orange (image by author)
Homogeneous parts of the time series that are "similar" to the validation section are highlighted in orange (image by author)

In these parts, there are repeated patterns, and the time series is trend-stationary – the value fluctuates around the average, then rising to a value above 1000 kWh, then falling to 0. Therefore, the ability to reproduce these patterns for the constructed pipeline is very important. But it is not necessary to guess the low-frequency fluctuations of the time series (for example, the trend or seasonality). The KNN model is suitable for these tasks. The forecast quality metrics obtained after chain composition are MAE – 88.19 and RMSE – 177.31.

It is important to note that we have prepared a solution in automatic mode and did not add any additional expert knowledge to the search algorithm. This task was solved in just 5 minutes of the framework running on a laptop. Undoubtedly, for large data sets, it will take more time to compose a good pipeline.

Comparison with competitors

Disclaimer: The comparison in this section is far from exhaustive. To justify that one framework is better or worse than another, you need to conduct a lot more experiments. It is advisable to use more than one data source, apply cross-validation, run algorithms on the same data and with the same parameters several times (with averaging of metrics). Here we have an introductory comparison: we showed how alternative solutions can cope with the task. If you are interested in how FEDOT deals with time series in comparison with other frameworks, follow the news in ResearchGate. A full-fledged comparison in a scientific paper will be available soon!

Let’s try to compare FEDOT with other open-source frameworks for time series forecasting – AutoTS and pmdarima. Jupyter notebook with the code, as well as plots, is available via link. Since not all libraries implement validation functionality on multiple blocks, it was decided to make this small comparison on just one fragment of the time series. Each algorithm was run 3 times, and the error metrics were averaged. The table with metrics looks like this (the cells show the std – standard deviation):

The figure also shows the predictions for one of the experiments:

Example of forecasts obtained by competing algorithms (image by author)
Example of forecasts obtained by competing algorithms (image by author)

It can be seen in the figure that the forecast obtained with FEDOT is more "similar to the actual data".

Conclusion

So, today we have looked at such an increasingly popular area in machine learning as AutoML. In this post, we review existing solutions for the automatic generation of ML pipelines and figure out how they can be used in the time series forecasting task.

We also tried AutoML on the example of forecasting the series of generated electricity using the FEDOT framework: we restored the missing values, built a pipeline using an evolutionary algorithm, and validated the solution. At the end, a brief comparison of FEDOT with other frameworks on this task is demonstrated.

Examples (code and plots) from this post are available in a github repository via link.

A couple of additional links for those who decided to go deeper:

  • Github repository with FEDOT
  • Github repository of the developed web module for the framework – FEDOT.Web
  • Chat for the discussion and problem-solving for FEDOT

Use AutoML, try FEDOT!

Mikhail Sarafanov, Pavel Vychuzhanin, and Nikolay Nikitin worked on the article.


Related Articles