The world’s leading publication for data science, AI, and ML professionals.

Neural Network for input of variable length using Tensorflow TimeDistributed wrapper

Guide on how to deal with the case in which we have inputs (usually signals) of variable length, using the Tensorflow TimeDistributed…

Table of Contents

  1. Why variable input length?
  2. TensorFlow Timedistributed Wrapper
  3. Data Generator
  4. References

Why variable input length?

Have you ever wanted to apply a neural network to your dataset, but the data (signals, time series, texts, etc.) had a variable length? Unfortunately, this situation is quite common for a Data Scientist.

As we know, in the real world, data is never as beautiful and organized as we would like.

There are various solutions to handle this problem, but none of them satisfied me.

The most commonly adopted solution is to truncate all inputs to the same length, which usually coincides with the shorter length input. However, this creates a huge loss of data, and as we know, data is gold to us.

One possible alternative is its opposite, which is padding (add data until all signals are at the same length). The problem with padding is that it adds data with no real sense, and also with very long inputs, the network becomes unsustainable in size. Of course, padding could be done via augmentation. However, in particular as regards the signals in which the order of the data is fundamental, applying augmentation is going to "dirty" this information.

I was feeling lost, but when I came across this wrapper, I knew it was the one for me.

Tensorflow Timedistributed Wrapper

The TimeDistributed wrapper allows to apply a layer to every temporal slice of an input.

Let’s assume that as input we have a dataset composed of signals sampled at 100 Hz (100 points per second). Our goal is to classify each 30-second segment (called epoch).

We, therefore, want to build a deep neural network to be applied recursively to each of these segments. To clarify how the network works, here is a simplified diagram:

Schema on how the network works [Image by Author]
Schema on how the network works [Image by Author]

Let’s start by importing all the necessary elements:

from tensorflow.keras.layers import Conv2D, TimeDistributed,Dropout,Input, Dense,
    BatchNormalization, GRU, Layer, Flatten
from tensorflow.keras.regularizers import l2
from tensorflow.keras.models import Model
from tensorflow.keras.utils import plot_model
from tensorflow.keras.optimizers import Adam

Now we can build our network. We will use convoluted blocks (CNN) to extract features from raw signals, and subsequently Gated Recurrent Unit (GRU) to combine the extracted features. So let’s write our function:

def nn(shape_1,shape_2):
    input = Input(shape=[None, shape_1,shape_2,1])

    conv1 = TimeDistributed(Conv2D(filters=32, kernel_size=[32,1], activation='relu',strides =(3,1)))(input)
    batch1 = TimeDistributed(BatchNormalization())(conv1)

    conv2 = TimeDistributed(Conv2D(filters=32, kernel_size=[32,1], activation='relu',strides =(2,1)))(batch1)
    batch2 = TimeDistributed(BatchNormalization())(conv2)

    conv3 = TimeDistributed(Conv2D(filters=32, kernel_size=[32,1], activation='relu',strides =(2,1)))(batch2)
    batch3 = TimeDistributed(BatchNormalization())(conv3)

    conv4 = TimeDistributed(Conv2D(filters=32, kernel_size=[32,1], activation='relu',strides =(2,1)))(batch3)
    batch4 = TimeDistributed(BatchNormalization())(conv4)

    flat = TimeDistributed(Flatten())(batch4)

    gru1 = GRU(256, activation='relu',return_sequences=True, kernel_regularizer=l2(0.01))(flat)
    drop1 = Dropout(rate=0.4)(gru1)
    batch1 = BatchNormalization()(drop1)

    gru2 = GRU(128, activation='relu',return_sequences=True, kernel_regularizer=l2(0.01))(batch1)
    drop2 = Dropout(rate=0.4)(gru2)
    batch2 = BatchNormalization()(drop2)

    dense = TimeDistributed(Dense(2, activation='softmax'),name = 'output')(batch2)

    return [input], [dense]

As we can see, the network is made up of four convolutional layers, and two GRUs. In the network there are also other elements, such as the Batch Normalization layers, which for reasons of time we will not go into further detail. Finally, a Dense layer to allow classification. In case you want to classify the entire signal and not every epoch, we can use as the last layer the following:

dense = Dense(2, activation='sigmoid',name = 'status_output')(batch2)

An important thing to note is that the wrapper should not be applied to temporal layers, such as GRU or LSTM. This type of layer can already handle variable lengths by default.

Once the function is ready, let’s build the model and see it in detail:

EPOCH_LENGTH = 30
SAMPLE_RATE = 100

input, output = nn(SAMPLE_RATE*EPOCH_LENGTH,1)
model = Model(inputs=input,outputs=output)

optimizer = Adam(learning_rate=2*1e-4)

# Compile Model
model.compile(optimizer=optimizer, loss={
                  'output': 'sparse_categorical_crossentropy', },
              metrics={
                  'output': 'sparse_categorical_accuracy', },
              sample_weight_mode='temporal')
model.summary()

Output:

Layer (type)                 Output Shape              Param #   
=================================================================
input_13 (InputLayer)        [(None, None, 3000, 1, 1) 0         
_________________________________________________________________
time_distributed_58 (TimeDis (None, None, 990, 1, 32)  1056      
_________________________________________________________________
time_distributed_59 (TimeDis (None, None, 990, 1, 32)  128       
_________________________________________________________________
time_distributed_60 (TimeDis (None, None, 480, 1, 32)  32800     
_________________________________________________________________
time_distributed_61 (TimeDis (None, None, 480, 1, 32)  128       
_________________________________________________________________
time_distributed_62 (TimeDis (None, None, 225, 1, 32)  32800     
_________________________________________________________________
time_distributed_63 (TimeDis (None, None, 225, 1, 32)  128       
_________________________________________________________________
time_distributed_64 (TimeDis (None, None, 97, 1, 32)   32800     
_________________________________________________________________
time_distributed_65 (TimeDis (None, None, 97, 1, 32)   128       
_________________________________________________________________
time_distributed_66 (TimeDis (None, None, 3104)        0         
_________________________________________________________________
gru_22 (GRU)                 (None, None, 256)         2582016   
_________________________________________________________________
dropout_20 (Dropout)         (None, None, 256)         0         
_________________________________________________________________
batch_normalization_48 (Batc (None, None, 256)         1024      
_________________________________________________________________
gru_23 (GRU)                 (None, None, 128)         148224    
_________________________________________________________________
dropout_21 (Dropout)         (None, None, 128)         0         
_________________________________________________________________
batch_normalization_49 (Batc (None, None, 128)         512       
_________________________________________________________________
output (TimeDistributed)     (None, None, 2)           258       
=================================================================
Total params: 2,832,002
Trainable params: 2,830,978
Non-trainable params: 1,024
_________________________________________________________________

As we can see, the network has 2.8M of trainable parameters, which all in all are quite a small number. Let’s visualize it graphically:

model._layers = [
    layer for layer in model._layers if isinstance(layer, Layer)
]

plot_model(model, 'model.png', show_shapes=True)
model shape [Image from Author]
model shape [Image from Author]

From the shapes of the network, we note that TimeDistributed has added a dimension to the default ones (i.e. the second question mark), which corresponds to the number of epochs that will vary for each signal.

Furthermore, GRU (or LSTM) allows us to exploit temporal information, which is essential in signals.

Unfortunately, this network does not train itself, especially in the case of our example, where we have various files identified by IDs. We then build a generator to manage the input.

Data Generator

In our example, we have a list of files, each representing an input signal. We, therefore, need to build a Generator for Tensorflow that takes the signals and prepares them for the network at runtime.

Now let’s build a class that takes as input an object of type Sequence, and implements all the methods necessary to train the network (_init, len, getitem, on_epochend):

import numpy as np
from keras.utils import Sequence
from keras.preprocessing.sequence import pad_sequences

class DataGenerator(Sequence):
    """Generates data for Keras
    Sequence based data generator. Suitable for building data generator for training and prediction.
    """

    def __init__(self, list_IDs, input_path, target_path,
                 to_fit=True, batch_size=32, shuffle=True):
        """Initialization
        :param list_IDs: list of all 'label' ids to use in the generator
        :param to_fit: True to return X and y, False to return X only
        :param batch_size: batch size at each iteration
        :param shuffle: True to shuffle label indexes after every epoch
        """
        self.input_path = input_path
        self.target_path = target_path
        self.list_IDs = list_IDs
        self.to_fit = to_fit
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.on_epoch_end()

    def __len__(self):
        """Denotes the number of batches per epoch
        :return: number of batches per epoch
        """
        return int(np.floor(len(self.list_IDs) / self.batch_size))

    def __getitem__(self, index):
        """Generate one batch of data
        :param index: index of the batch
        :return: X and y when fitting. X only when predicting
        """

        # Generate indexes of the batch
        indexes = self.indexes[index * self.batch_size:(index + 1) * self.batch_size]

        # Find list of IDs
        list_IDs_temp = [self.list_IDs[k] for k in indexes]

        # Generate data
        X = self._generate_X(list_IDs_temp)

        if self.to_fit:
            y = self._generate_y(list_IDs_temp)
            return [X], y
        else:
            return [X]

    def on_epoch_end(self):
        """
        Updates indexes after each epoch
        """
        self.indexes = np.arange(len(self.list_IDs))
        if self.shuffle:
            np.random.shuffle(self.indexes)

    def _generate_X(self, list_IDs_temp):
        """Generates data containing batch_size images
        :param list_IDs_temp: list of label ids to load
        :return: batch of images
        """
        # Initialization
        X = []

        # Generate data
        for i, ID in enumerate(list_IDs_temp):
            # Store sample
            temp = self._load_input(self.input_path, ID)
            X.append(temp)

        X = pad_sequences(X, value=0, padding='post')

        return X

    def _generate_y(self, list_IDs_temp):
        """Generates data containing batch_size masks
        :param list_IDs_temp: list of label ids to load
        :return: batch if masks
        """
        y = []

        # Generate data
        for i, ID in enumerate(list_IDs_temp):
            # Store sample
            y.append(self._load_target(self.target_path, ID))

        y = pad_sequences(y, value=0, padding='post')

        return y

Once the generator class has been written, let’s extract the list of IDs of the files, splitting them into training, validation, and test. Note that we build two generators, one for training and one for validation:

import numpy as np
import re
from os import listdir
from os.path import isfile, join
TEST_SIZE = 128
onlyfiles = [f for f in listdir(input_path) if isfile(join(input_path, f))]

id = [re.search('(.+?).npz', x).group(1) for x in onlyfiles]
id.sort()

np.random.seed(1234)
id_test = np.random.choice(id, size=TEST_SIZE,replace=False)
id = list(set(id) - set(id_test))
id_validation = np.random.choice(id, size=TEST_SIZE,replace=False)
id = list(set(id) - set(id_validation))

print(len(id))

training_generator = DataGenerator(id,  input_path = input_path,
                                   target_path=target_path)

validation_generator = DataGenerator(id_validation, input_path = input_path,
                                   target_path=target_path)

We can now finally run the model training, and cross our fingers 😆 .

model.fit(training_generator,
                    validation_data=validation_generator,
                    epochs=8,
                    use_multiprocessing=True)

For this article we are done, I leave you in the references the link to the repo that contains the code.

See you soon,

Francesco


References

  1. TimeDistributed Tensorflow wrapper
  2. Github repository with code

Related Articles