The world’s leading publication for data science, AI, and ML professionals.

Predictive Analytics: Time-Series Forecasting with GRU and BiLSTM in TensorFlow

A step-by-step tutorial on building GRU and Bidirectional LSTM for Time-series forecasting

Photo by Enrique Alarcon on Unsplash
Photo by Enrique Alarcon on Unsplash

Recurrent Neural Networks are designed to handle the complexity of sequence dependence in time-series analysis. In this tutorial, I build Gru and BiLSTM for a univariate time-series predictive model. Gated Recurrent Unit (GRU) is a new generation of Neural Networks and is pretty similar to Long Short Term Memory (LSTM). Whereas, the idea of Bidirectional LSTMs (BiLSTM) is to aggregate input information in the past and future of a specific time step in LSTM models.

The following article serves a good introduction to LSTM, GRU and BiLSTM.

Predictive Analytics: LSTM, GRU and Bidirectional LSTM in TensorFlow

What is the time-series analysis?

Unlike regression analysis, in time-series analysis, we do not have strong evidence of what affects our target. A time-series analysis uses time as one of the variables in order to see if there is a change over time.

What is the time-series forecasting?

The purpose of time-series forecasting is fitting a model on historical data and using it to predict future observations. This post is dedicated to time-series forecasting using deep learning methods. If you are willing to learn about classical methods for time-series forecasting, I suggest you read this webpage.

☺ Let’s code❗

👩 ‍💻 Python Code on GitHub

Dataset

For this project, the data is daily water consumption of the city of Brossard, Quebec, Canada, obtained from 2011–09–01 to 2015–09–30.

Import libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler, StandardScaler
import warnings
warnings.filterwarnings('ignore')
from scipy import stats
%matplotlib inline
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential, layers, callbacks
from tensorflow.keras.layers import Dense, Lstm, Dropout, GRU, Bidirectional

Set random seed

tf.random.set_seed(1234)

1. Read and explore data

In this project, I am dealing with univariate time-series data. While I import the data from a CSV file, I make sure the Date column has the correct DateTime format by parse_dates = [‘Date’]. Also, when I work with date and time, it becomes much easier if I set the Date column as the dataframe index.

df = pd.read_csv('Data.csv', parse_dates = ['Date'])

1.1 Time-series plot

To better understand the data, I plot daily, monthly and yearly water consumption.

# Define a function to draw time_series plot
def timeseries (x_axis, y_axis, x_label):
    plt.figure(figsize = (10, 6))
    plt.plot(x_axis, y_axis, color ='black')
    plt.xlabel(x_label, {'fontsize': 12}) 
    plt.ylabel('Water consumption ($m³$/capita.day)', 
                                  {'fontsize': 12})
dataset = df.copy()
timeseries(df.index, dataset['WC'], 'Time (day)')
dataset['month'] = dataset.index.month
dataset_by_month = dataset.resample('M').sum()
timeseries(dataset_by_month.index, dataset_by_month['WC'], 
           'Time(month)')
dataset['year'] = dataset.index.year
dataset_by_year = dataset.resample('Y').sum()
timeseries(dataset_by_year.index, dataset_by_year['WC'], 
           'Time (month)')
Time-series of daily water consumption
Time-series of daily water consumption

You can see that the data has a seasonal pattern.

Time-series of monthly water consumption
Time-series of monthly water consumption
Time-series of yearly water consumption
Time-series of yearly water consumption

1.2 Handle missing values

First, I want to check for the number of missing values and determine on what date no data value is stored. Then I use linear interpolation to replace missing values.

# Check for missing values
print('Total num of missing values:') 
print(df.WC.isna().sum())
print('')
# Locate the missing value
df_missing_date = df.loc[df.WC.isna() == True]
print('The date of missing value:')
print(df_missing_date.loc[:,['Date']])
# Replcase missing value with interpolation
df.WC.interpolate(inplace = True)
# Keep WC and drop Date
df = df.drop('Date', axis = 1)

1.3 Split the dataset into train and test data

Like I usually do, I set the first 80% of data as train data and the remaining 20% as test data. I train the model with train data and validate its performance with test data.

💡 As a reminder, you have to use iloc to find a subset of the dataframe based on their index position

# Split train data and test data
train_size = int(len(df)*0.8)

train_data = df.iloc[:train_size]
test_data = df.iloc[train_size:]

1.4 Data transformation

A good rule of thumb is that normalized data lead to better performance in Neural Networks. In this project, I use MinMaxScaler from scikit-learn.

You need to follow three steps to perform data transformation:

  • Fit the scaler (MinMaxScaler) using available training data (It means that the minimum and maximum observable values are estimated using training data.)
  • Apply the scaler to training data
  • Apply the scaler to test data

💡 It is important to note that the input to MinMaxScaler().fit() can be array-like or dataframe of shape (n_samples, n_features). In this project:

train_data.shap = (1192,1)

scaler = MinMaxScaler().fit(train_data)
train_scaled = scaler.transform(train_data)
test_scaled = scaler.transform(test_data)

1.5 Create input

GRU and BiLSTM take a 3D input (num_samples, num_timesteps, num_features). So, I create a helper function, _createdataset, to reshape the input.

In this project, I define look_back = 30. It means that the model makes predictions based on the last 30-day data (In the first iteration of the for-loop, the input carries the first 30 days and the output is water consumption on the 30th day).

# Create input dataset
def create_dataset (X, look_back = 1):
    Xs, ys = [], []

    for i in range(len(X)-look_back):
        v = X[i:i+look_back]
        Xs.append(v)
        ys.append(X[i+look_back])

    return np.array(Xs), np.array(ys)
LOOK_BACK = 30
X_train, y_train = create_dataset(train_scaled,LOOK_BACK)
X_test, y_test = create_dataset(test_scaled,LOOK_BACK)
# Print data shape
print('X_train.shape: ', X_train.shape)
print('y_train.shape: ', y_train.shape)
print('X_test.shape: ', X_test.shape) 
print('y_test.shape: ', y_test.shape)

2. Create models

The first function, _createbilstm, creates a BiDLSM and gets the number of units (neurons) in hidden layers. The second function, _creategru, builds a GRU and gets the number of units in hidden layers.

Both have 64 neurons in the input layer, one hidden layer including 64 neurons and 1 neuron in the output layer. the optimizer in both models is adam. To make the GRU model robust to changes, the Dropout function is used. Dropout(0.2) randomly drops 20% of units from the network.

# Create BiLSTM model
def create_bilstm(units):
    model = Sequential()
    # Input layer
    model.add(Bidirectional(
              LSTM(units = units, return_sequences=True), 
              input_shape=(X_train.shape[1], X_train.shape[2])))
    # Hidden layer
    model.add(Bidirectional(LSTM(units = units)))
    model.add(Dense(1))
    #Compile model
    model.compile(optimizer='adam',loss='mse')
    return model
model_bilstm = create_bilstm(64)
# Create GRU model
def create_gru(units):
    model = Sequential()
    # Input layer
    model.add(GRU (units = units, return_sequences = True, 
    input_shape = [X_train.shape[1], X_train.shape[2]]))
    model.add(Dropout(0.2)) 
    # Hidden layer
    model.add(GRU(units = units)) 
    model.add(Dropout(0.2))
    model.add(Dense(units = 1)) 
    #Compile model
    model.compile(optimizer='adam',loss='mse')
    return model
model_gru = create_gru(64)

2.1 Fit the models

I create a function, _fitmodel, to take the model and train the model with train data for 100 epoch and batch_size = 16. I get the model to use 20% of train data as validation data. I set shuffle = False because it gives better performance.

To avoid overfitting, I set an early stop to stop training when validation loss has not improved after 10 epochs (patience = 10).

def fit_model(model):
    early_stop = keras.callbacks.EarlyStopping(monitor = 'val_loss',
                                               patience = 10)
    history = model.fit(X_train, y_train, epochs = 100,  
                        validation_split = 0.2,
                        batch_size = 16, shuffle = False, 
                        callbacks = [early_stop])
    return history
history_gru = fit_model(model_gru)
history_bilstm = fit_model(model_bilstm)

2.2 Inverse transform of the target variable

After building the model, I have to transform the target variable back to original data space for train and test data using scaler.inverse_transform.

y_test = scaler.inverse_transform(y_test)
y_train = scaler.inverse_transform(y_train)

3. Evaluate models’ performance

How are we going to evaluate the performance of GRU and BiLSTM?

1- Plot train loss and validation loss

To evaluate the model performance, I plot train loss vs validation loss and I anticipate to see validation loss is lower than training loss 😉

2- Compare prediction vs test data

First, I predict WC using BiLSTM and GRU models. Then, I plot test data vs prediction the two models.

3- Calculate RMSE and MAE

I use two goodness-of-fit measures to estimate the accuracy of the models.

3.1 Plot train loss and validation loss

def plot_loss (history, model_name):
    plt.figure(figsize = (10, 6))
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('Model Train vs Validation Loss for ' + model_name)
    plt.ylabel('Loss')
    plt.xlabel('epoch')
    plt.legend(['Train loss', 'Validation loss'], loc='upper right')

plot_loss (history_gru, 'GRU')
plot_loss (history_bilstm, 'Bidirectional LSTM')

3.2 Compare prediction vs test data

# Make prediction
def prediction(model):
    prediction = model.predict(X_test)
    prediction = scaler.inverse_transform(prediction)
    return prediction
prediction_gru = prediction(model_gru)
prediction_bilstm = prediction(model_bilstm)
# Plot test data vs prediction
def plot_future(prediction, model_name, y_test):
    plt.figure(figsize=(10, 6))
    range_future = len(prediction)
    plt.plot(np.arange(range_future), np.array(y_test), 
             label='Test   data')
    plt.plot(np.arange(range_future), 
             np.array(prediction),label='Prediction')
    plt.title('Test data vs prediction for ' + model_name)
    plt.legend(loc='upper left')
    plt.xlabel('Time (day)')
    plt.ylabel('Daily water consumption ($m³$/capita.day)')

plot_future(prediction_gru, 'GRU', y_test)
plot_future(prediction_bilstm, 'Bidirectional LSTM', y_test)

3.3 Calculate RMSE and MAE

def evaluate_prediction(predictions, actual, model_name):
    errors = predictions - actual
    mse = np.square(errors).mean()
    rmse = np.sqrt(mse)
    mae = np.abs(errors).mean()
    print(model_name + ':')
    print('Mean Absolute Error: {:.4f}'.format(mae))
    print('Root Mean Square Error: {:.4f}'.format(rmse))
    print('')
evaluate_prediction(prediction_gru, y_test, 'GRU')
evaluate_prediction(prediction_bilstm, y_test, 'Bidirectiona LSTM')

4. Multi-step forecasting of water consumption in 30 days

To use the trained GRU and BiLSTM models for forecasting, I need to have at least 60 days of observed data to make predictions for the next 30 days. For the sake of illustration, I select a chunk of 60 days of water consumption from the observed data and forecast water consumption with GRU and BILSTM for the next 30 days.

# Make prediction for new data
def prediction(model):
    prediction = model.predict(X_30)
    prediction = scaler.inverse_transform(prediction)
    return prediction
prediction_gru = prediction(model_gru)
prediction_bilstm = prediction(model_bilstm)
# Plot history and future
def plot_multi_step(history, prediction1, prediction2):

    plt.figure(figsize=(15, 6))

    range_history = len(history)
    range_future = list(range(range_history, range_history +
                        len(prediction1)))
    plt.plot(np.arange(range_history), np.array(history), 
             label='History')
    plt.plot(range_future, np.array(prediction1),
             label='Forecasted for GRU')
    plt.plot(range_future, np.array(prediction2),
             label='Forecasted for BiLSTM')

    plt.legend(loc='upper right')
    plt.xlabel('Time step (day)')
    plt.ylabel('Water demand (lit/day)')

plot_multi_step(new_data, prediction_gru, prediction_bilstm)

Conclusion

Thank you for reading this article. I hope it helped you to develop GRU and BiLSTM models in TensorFlow for a time-series forecasting😊

Your feedback is greatly appreciated. You can reach me on LinkedIn.


Related Articles