
Supervised Learning with Time Series
Supervised learning involves training a Machine Learning model with an input data set. This data set is usually a matrix: A two-dimensional data structure composed of rows (samples) and columns (features).
A time series is a sequence of values ordered in time. So, it needs to be transformed for supervised learning.
In a previous article, we learned how to transform a univariate time series from a sequence into a matrix. This is done with a sliding window. Each observation of the series is modeled based on past recent values, also called lags.
Here’s an example of this transformation using the sequence from 1 to 10:

This transformation enables a type of modeling called auto-regression. In auto-regression, a model is built using the past recent values (lags) of a time series as explanatory variables. These are used to predict future observations (target variable). The intuition for the name auto-regression is that the time series is regressed with itself.
In the example above, the lags are the initial 5 columns. The target variable is the last column (the value of the series in the next time step).
Auto-Regression with Deep Learning
While most methods work with matrices, deep neural networks need a different structure.
The input to deep neural networks such as LSTMs or CNNs is a three-dimensional array. The actual data is the same as the one you’d put in a matrix. But, it’s structured differently.
Besides rows (samples) and columns (lags), the extra dimension refers to the number of variables in the series. In a matrix, you concatenate all attributes together irrespective of their source. Neural networks are a bit tidier. The input is organized by each variable in the series using a third dimension.
Let’s do a practical example to make this clear.
Hands-On

In this tutorial, you’ll learn how to transform a time series for supervised learning with an LSTM (Long Short-Term Memory). An LSTM is a type of neural network that is especially useful to model time series.
We’ll split the time series transformation process into two steps:
- From a sequence of values into a matrix;
- From a matrix into a 3-d array for deep learning.
First, we’ll do an example with a univariate time series. Multivariate time series are covered next.
Univariate Time Series
Let’s start by reading the data. We’ll use a time series related to the sales of different kinds of wine. You can check the source in reference [1].
import pandas as pd
# https://github.com/vcerqueira/blog/tree/main/data
data = pd.read_csv('data/wine_sales.csv', parse_dates=['date'])
data.set_index('date', inplace=True)
series = data['Sparkling']
We focus on the sales of sparkling wine to do an example for the univariate case. This time series looks like this:

From a sequence of values into a matrix
We apply a sliding window to transform this series for supervised learning. You can learn more about this process in a previous article.
# src module here: https://github.com/vcerqueira/blog/tree/main/src
from src.tde import time_delay_embedding
# using 3 lags as explanatory variables
N_LAGS = 3
# forecasting the next 2 values
HORIZON = 2
# using a sliding window method called time delay embedding
X, Y = time_delay_embedding(series, n_lags=N_LAGS, horizon=HORIZON, return_Xy=True)
Here’s a sample of the explanatory variables (X) and corresponding target variables (Y):

This data set is the basis for training traditional machine learning methods. For example, a linear regression or an xgboost.
from sklearn.linear_model import RidgeCV
# training a ridge regression model
model = RidgeCV()
model.fit(X, Y)
From a matrix into a 3-d structure for deep learning
You need to reshape this data set to train a neural network like an LSTM. The following function can be used to do this:
import re
import pandas as pd
import numpy as np
def from_matrix_to_3d(df: pd.DataFrame) -> np.ndarray:
"""
Transforming a time series from matrix into 3-d structure for deep learning
:param df: (pd.DataFrame) Time series in the matrix format after embedding
:return: Reshaped time series into 3-d structure
"""
cols = df.columns
# getting unique variables in the time series
# this list has a single element for univariate time series
var_names = np.unique([re.sub(r'([^)]*)', '', c) for c in cols]).tolist()
# getting observation for each variable
arr_by_var = [df.loc[:, cols.str.contains(v)].values for v in var_names]
# reshaping the data of each variable into a 3-d format
arr_by_var = [x.reshape(x.shape[0], x.shape[1], 1) for x in arr_by_var]
# concatenating the arrays of each variable into a single array
ts_arr = np.concatenate(arr_by_var, axis=2)
return ts_arr
# transforming the matrices
X_3d = from_matrix_to_3d(X)
Y_3d = from_matrix_to_3d(Y)
Finally, you can train an LSTM using the resulting data set:
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import (Dense,
LSTM,
TimeDistributed,
RepeatVector)
# number of variables in the time series
# 1 because the series is univariate
N_FEATURES = 1
# creating a simple stacked LSTM
model = Sequential()
model.add(LSTM(8, activation='relu', input_shape=(N_LAGS, N_FEATURES)))
model.add(RepeatVector(HORIZON))
model.add(LSTM(4, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(N_FEATURES)))
model.compile(optimizer='adam', loss='mse')
# compiling the model
model.compile(optimizer='adam', loss='mse')
# basic train/validation split
X_train, X_valid, Y_train, Y_valid = train_test_split(X_3d, Y_3d, test_size=.2, shuffle=False)
# training the model
model.fit(X_train, Y_train, epochs=100, validation_data=(X_valid, Y_valid))
# making predictions
preds = model.predict_on_batch(X_valid)
Multivariate Time Series
Now, let’s look at a multivariate time series example. In this case, the goal is to forecast the future values of several variables, not just one. So, you need a model for multivariate and multi-step forecasting.

The transformation process is like before.
To transform the multivariate time series into a matrix format, you can apply the sliding window approach to each variable. Then, you combine all resulting matrices into a single one.
Here’s an example:
# transforming each variable into a matrix format
mat_by_variable = []
for col in data:
col_df = time_delay_embedding(data[col], n_lags=N_LAGS, horizon=HORIZON)
mat_by_variable.append(col_df)
# concatenating all variables
mat_df = pd.concat(mat_by_variable, axis=1).dropna()
# defining target (Y) and explanatory variables (X)
predictor_variables = mat_df.columns.str.contains('(t-|(t)')
target_variables = mat_df.columns.str.contains('(t+')
X = mat_df.iloc[:, predictor_variables]
Y = mat_df.iloc[:, target_variables]
The explanatory variables look like this for two of the variables (others are omitted for conciseness):

You can use the same function to transform the data into three dimensions:
X_3d = from_matrix_to_3d(X)
Y_3d = from_matrix_to_3d(Y)
The training part is also like before. The information about the number of variables in the series is provided in the _NFEATURES constant. As the name implied, this constant is the number of variables in the time series.
model = Sequential()
model.add(LSTM(8, activation='relu', input_shape=(N_LAGS, N_FEATURES)))
model.add(Dropout(.2))
model.add(RepeatVector(HORIZON))
model.add(LSTM(4, activation='relu', return_sequences=True))
model.add(Dropout(.2))
model.add(TimeDistributed(Dense(N_FEATURES)))
model.compile(optimizer='adam', loss='mse')
X_train, X_valid, Y_train, Y_valid = train_test_split(X_3d, Y_3d, test_size=.2, shuffle=False)
model.fit(X_train, Y_train, epochs=500, validation_data=(X_valid, Y_valid))
preds = model.predict_on_batch(X_valid)
The following plot shows a sample of one-step ahead forecasts.

The forecasts are not that good. The time series is small and we didn’t optimize the model in any way. Deep learning methods are known to be data-hungry. So, if you go for this kind of approach, make sure you have enough data.
Key Takeaways
Deep learning is increasingly relevant in time series applications.
In this article, we explored how to transform a time series for deep learning.
The input to traditional machine learning algorithms is a matrix. But, neural networks such as LSTMs work with three-dimensional data sets. So, time series need to be transformed from a sequence into this format.
The transformation is based on a sliding window that is applied to each variable in the series.
Thank you for reading, and see you in the next story!
Related Articles
- Machine Learning for Forecasting: Transformations and Feature Extraction
- Machine Learning for Forecasting: Supervised Learning with Multivariate Time Series
References
[1] Rob Hyndman and Yangzhuoran Yang (2018). tsdl: Time Series Data Library. v0.1.0. (GPL-3 Licence)