Multivariate Timeseries Forecast with Lead and Lag Timesteps Using LSTM

NandaKishore Joshi
Towards Data Science
4 min readMay 16, 2021

--

Why Multivariate and how can it help to make better predictions?

Time series forecast plays a critical role in taking decisions in most industries. For example , forecasting the number containers to be purchased for a shipping company can save millions for the business. Similarly forecasting the demand for a particular product type plays a very important role in pricing and hence profitability for a e-commerce company.

In most cases its the business or the operations team who knows the factors that affect the demand or supply. Simply making the forecast based on the historical patterns may not always yield the desired output or it might not consider the future prospects. There are chances that the past mistakes can be repeated in the future forecasts. It is always good to consider the factors in effect and provide the team capability to play around and understand their impact on the predictions.

This is when the multivariate timeseries forecast comes into picture. Let us understand the multivariate forecast using below images

Figure 1: Multivariate Timeseries Forecast with lag data (lag=5 steps)

Figure 1 depicts the multivariate timeseries forecast of the dependent variable Y at time t with a lag=5. Cell in red is the value to be forecasted at time t which depends on the values in yellow cells (t-5 to t). These are the independent variables effecting the prediction of Y at t.

We can consider multivariate timeseries as regression problem with independent variables being the features of the previous lag (till t-1)along with the independent values of time t. With this approach there is lot more control on the forecast than just the previous timestamps.

Below is the Multivariate timeseries which also considers the lead values

Figure 2: Multivariate timeseries with lead and lag features

From the above figure we can see that, along with the lag features, lead=2 (t+2) timesteps is also considered to make the forecast. This gives us more control on the factors effecting the forecast. In many cases we know that some of the future factors also effects our current time predictions. With these approaches, the decision making team can really simulate the forecast based on various input values of independent features.

Implementation of Forecast model using LSTM

Now let us see how to implement the multivariate timeseries with both lead and lag feature.

  1. Getting the data ready with lead and lag factors

The major difference between using a LSTM for a regression task to timeseries is that , in timeseries lead and lag timestamp data needs to be considered. Lets define a function which can just do this based on the lead and lag as a parameter

# convert series to supervised learning
def series_to_supervised(data, n_lag=1, n_lead=1, dropnan=True):
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
cols, names = list(), list()
# input sequence (t-n, … t-1)
for i in range(n_lag, 0, -1):
cols.append(df.shift(i))
names += [(‘var%d(t-%d)’ % (j+1, i)) for j in range(n_vars)]
# forecast sequence (t, t+1, … t+n)
for i in range(0, n_lead):
cols.append(df.shift(-i))
if i == 0:
names += [(‘var%d(t)’ % (j+1)) for j in range(n_vars)]
else:
names += [(‘var%d(t+%d)’ % (j+1, i)) for j in range(n_vars)]
# put it all together
agg = concat(cols, axis=1)
agg.columns = names
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg

The above functions converts the data into timeseries series with customized n_lag and n_lead steps. The ouput of this function contains data of lag and lead steps as columns with (t-n) or (t+n) timestamps

reframed = series_to_supervised(values, n_lag, (n_lead+1))#removing the future (t+n) dependent variable (Y)if n_lead>0:
reframed= reframed.drop(reframed.iloc[:,[i for i in range(df_no.shape[1]*(n_lag+1),reframed.shape[1],df_no.shape[1])]],axis=1)

The above code helps in dropping the future Y (at t+n) while training the models. Once we drop the future Y and we have the reframed data, its as simple as training the LSTM for a regression problem.

# splitting reframed to X and Y considering the first column to be out target featureX=reframed.drop(['var1(t)'],axis=1)
Y=reframed['var1(t)']
X_values=X.values
Y_values=Y.values
#n_preduct being the test lengthtrain_X,train_Y = X_values[:(X_values.shape[0]-n_predict),:],Y_values[:(X_values.shape[0]-n_predict)]
test_X,test_Y = X_values[(X_values.shape[0]-n_predict):,:],Y_values[(X_values.shape[0]-n_predict):]
#reshaping train and test to feed to LSTM
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

Creating a simple LSTM model

opt = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, decay=0.01)
model = Sequential()
model.add(LSTM(100,return_sequences=True, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dropout(0.25))
model.add(LSTM(units=50,return_sequences=True))
model.add(Dropout(0.20))
model.add(LSTM(units=10,return_sequences=False))
model.add(Dense(units=1, activation='linear'))
model.compile(loss='mae', optimizer=opt)

Once model is ready we can train the model on the train data and test it on the test. Below code shows some of the training checkpoints which can be used to help train a good model.

#adding few model check points
es = EarlyStopping(monitor='val_loss', min_delta=1e-10, patience=10, verbose=1)
rlr = ReduceLROnPlateau(monitor='val_loss', factor=0.01, patience=10, verbose=1)
mcp = ModelCheckpoint(filepath="/test.h5", monitor='val_loss', verbose=1, save_best_only=True, save_weights_only=False)
tb = TensorBoard('logs')
history = model.fit(train_X, train_Y, epochs=50, batch_size=10,callbacks=[mcp,rlr],validation_data=(test_X, test_Y), verbose=2, shuffle=False)

Once model is trained , we can get the predictions for our test data

yhat = model.predict(test_X)

Summary

In this article we saw what a multivariate timeseries is and how to use both lead and lag data to make the predicts. Some of the points to be noted while using this approach are

  1. As the n_lead and n_lag increases, the number of features at a particular prediction also increases. For example if we have 5 independent features at every time stamp and we conside n_lag=5 and n_lead =2, then the over all features post reframe will be 5+5*(n_lag)+5*(n_lead), which is in case 40 features.
  2. Good amount of training data is required as using lag and lead would reduce the trainig rows.
  3. LSTM model architecture has to be wisely considered to avoid the over fitting as number of features increases or decreases every time we change n_lead and n_lag.

--

--