
Predict rare events is becoming an important topic of research and development in a lot of Artificial Intelligent solutions. Survival Analysis, Customer churn prediction, Predictive Maintenance and Anomaly Detection are some examples of the most popular fields of application that deal with rare events. Given these scenarios, we can imagine a rare event as a particular state that occurs under specific conditions, divergent from normal behaviors, but which plays a key role in terms of economic interest.
In this post, I’ve developed a Machine Learning solution to predict the Remaining Useful Life (RUL) of a particular engine component. This kind of problem plays a key role in the field of Predictive Maintenance, where the purpose is to say ‘How much time is left before the next fault?‘. To achieve this target I developed a Convolutional NN in Keras that deals with time series in the form of images.
THE DATASET
For Data Scientists the most important problem, when dealing with this kind of task, is the lack of rare events in the form of observations available. So the first step to achieving good performance is to try to have at disposal the richest dataset that treats every kind of possible scenario.
Turbofan Engine Degradation Simulation Dataset, provided by NASA, is becoming an important benchmark in the Remaining Useful Life (RUL) estimation for a fleet of engines of the same type (100 in total). Data are available in the form of time series: 3 operational settings, 21 sensor measurements and cycle – i.e. observations in terms of time for working life.
The engine is operating normally at the start of each time series, and develops a fault at some point during the series. In the training set, the fault grows in magnitude until system failure. In the test set, the time series ends some time prior to system failure. The objective is to predict the number of remaining operational cycles before failure in the test set, i.e., the number of operational cycles after the last cycle that the engine will continue to operate.
To understand better this explanation we try to have a look at the data:
train_df.id.value_counts().plot.bar()

Engines have different life durations. The average working time in train data is 206 cycles with a minimum of 128 and a maximum of 362.
The operational settings and sensor measurements, in train set for a singular engine, are plotted belove:
engine_id = train_df[train_df['id'] == 1]
engine_id[train_df.columns[2:]].plot(subplots=True, sharex=True, figsize=(20,30))


To plot is always a good idea… In this way, we can have an impressive and general overview of the data at our disposal. At the end of the majority of the series, we can observe a divergent behavior, which announces a future failure.
PREPARE THE DATA
In order to predict the RUL for each engine, we’ve pursued a classification approach, generating the label by ourself in this way:
From 0 (fault) to 15 remaining cycles, we’ve labeled as 2; from 16 to 45 cycles, we’ve labeled as 1 and the rest (>46) as 0. It is clear that in a realistic scenario, the category labeled as 2 is the most economically valuable. Predict this class with good performance will permit us to operate an adequate program of maintenance, avoiding future faults and saving money.
In order to have at our disposal the maximum number of data for the train, we split the series with a fixed window and a sliding of 1 step. For example, engine1 have 192 cycles in train, with a window length equal to 50 we extract 142 time series with length 50: window1 -> from cycle0 to cycle50, window2 -> from cycle1 to cycle51, … , window142 -> from cycle141 to cycle50, window191. Each window is labeled with the corresponding label of the final cycle taken into account by the window.
sequence_length = 50
def gen_sequence(id_df, seq_len, seq_cols):
data_matrix = id_df[seq_cols].values
n_elem = data_matrix.shape[0]
for a,b in zip(range(0,n_elem-seq_len), range(seq_len,n_elem)):
yield data_matrix[a:b,:]
def gen_labels(id_df, seq_len, lab):
data_matrix = id_df[lab].values
n_elem= data_matrix.shape[0]
return data_matrix[seq_len:n_elem,:]
FROM TIME SERIES TO IMAGES
To make things more interesting I have decided to transform the series at our disposal in images; in order to feed our classification model with them.
I’ve created the images following this amazing resource. The concept is simple… when we try to transform time series into images we always make use of spectrogram. This choice is clever but not always the best one (as you can read here). In this post, the author explains his justified perplexity about dealing with audio series with a spectrogram representation. He talks about sound but the meaning can be translated in our scenario. Spectrograms are powerful but their usage may result in a loss of information, particularly if we try to approach the problem in a computer vision way. To be efficient a 2D CNN requires spatial invariance; this builds on the assumption that features of a classical image (like a photo) carry the same meaning regardless of their location. On the other side, a spectrogram implies a two dimensions representation made by two different units (frequency and time).
For these reasons, I decided to transform my time series windows (of length 50 cycles) making use of Recurrence Plots. They are easy to implement in python with a few lines of code, making use of Scipy.
from scipy.spatial.distance import pdist, squareform
def rec_plot(s, eps=0.10, steps=10):
d = pdist(s[:,None])
d = np.floor(d/eps)
d[d>steps] = steps
Z = squareform(d)
return Z
With this function, we are able to generate an image of 50×50 for every time series at our disposal (I’ve excluded the constant time series with 0 variances). So every single observation is made by an array of images of size 50x50x17 (17 are the time series with no zero variance) like below.

THE MODEL
At this point, we are ready to build our model. I’ve adopted a classical 2D CNN architecture:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(50, 50, 17)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
I fit it for only 10 epochs and I’ve achieved an ACCURACY of 0.832%.

From the confusion matrix we can see that our model can well discriminate when an engine is close to failure (2 labels: <16 cycles remaining) or when it works normally (0 label: >45 cycles). A little bit of noise is present in the intermediate class (>15, <46 cycles). We are satisfied to achieve a great and clear result for the prediction of class 2 – i.e. near to failure.
SUMMARY
In this post, we try to solve a Predictive Maintenance problem. Estimating RUL of engines we are conscious to deal with rare events, due to the difficulty to collect this kind of data. We propose an interesting solution transforming time series into images, making use of Recurrence Plots. In this way, we are able to discriminate well engines which are at the end of their working life.
Keep in touch: Linkedin