
Exploring NASA’s turbofan dataset
<disclaimer: I aim to showcase the effect of different methods and choices made during model development. These effects are often shown using the test set, something which is considered (very) bad practice but helps for educational purposes.>
In my last post we explored NASA’s FD001 turbofan degradation dataset. To quickly recap, sensors 1, 5, 6, 10, 16, 18 and 19 held no information related to Remaining Useful Life (RUL). After removing these from the data we fitted a baseline Linear Regression model with an RMSE of 31.95. Today we’ll re-examine our assumption of RUL to improve our accuracy and fit a Support Vector Regression (SVR) in an attempt to further improve upon our results. Let’s get started!
Loading data
First, we’ll load the data and inspect the first few rows to confirm it loaded correctly.

The loaded data looks good, let’s compute the linear declining RUL like we did before.

Before re-examining the way we computed the RUL, we’ll prepare the data by dropping the columns which hold no useful information. This will allow us to directly test any changes we’re making as our data is ready to be fitted.
We also define the function to evaluate model performance.
And refit our baseline model.
# returns
# train set RMSE:44.66819159545453, R2:0.5794486527796716
# test set RMSE:31.952633027741815, R2:0.40877368076574083
Re-examining RUL
Previously I assumed RUL to decline linearly over time. However, in the last post, we saw this may influence our overall model performance. There’s a method to improve our assumption which I’ll try to explain below [1].
Looking at the sensor signals (see one example below), many sensors seem rather constant in the beginning. This is because the engines only develop a fault over time. The bend in the curve of the signal is the first bit of information provided to us that the engine is degrading and the first time it is reasonable to assume RUL is linearly declining. We can’t really say anything about the RUL before that point because we have no information about the initial wear and tear.

We can update our assumption to reflect this logic. Instead of having our RUL decline linearly, we define our RUL to start out as a constant and only decline linearly after some time (see example above). By doing so we achieve two things:
- Initially constant RUL correlates better with the initially constant mean sensor signal
- Lower peak values for RUL result in lower spread of our target variable, making it easier to fit a line
Consequently, this change allows our regression model to more accurately predict low RUL values, which are often more interesting/critical to predict correctly.
Using pandas, you can simply clip the previously computed linearly declining RUL at the desired upper bound value. Testing multiple upper bound values indicated clipping RUL at 125 yielded the biggest improvement for the model. As we’re updating our assumption of RUL for the train set, we should include this change in our evaluation. The true RUL of the test set remains untouched. Let’s examine the effect of this change.
# returns
# train set RMSE:21.491018701515415, R2:0.7340432868050447
# test set RMSE:21.900213406890515, R2:0.7222608196546241
The train RMSE has more than halved. Of course, we’ve set those targets ourselves, but it shows how much impact the previous assumption of RUL had on overall model performance. Much more important tough, is the improvement on the test set. The test RMSE reduced from 31.95 to 21.90, an improvement of 31%! Which informs us the updated assumption is beneficial for modeling true RUL. Let’s see if we can do even better using another technique.
Support Vector Regression
Linear SVR mainly differs from Linear Regression by setting a boundary at a distance epsilon (ɛ) from your reference data (see image below). Points which fall within the boundaries are ignored when minimizing the loss function during model fitting. Fitting your model on points outside these boundaries reduces computational load and allows you to capture more intricate behavior, but this technique is also more sensitive to outliers!
![Source: [2]. The solid black line represents the target, the dashed lines are the boundaries at distance epsilon (ɛ). Only points outside the boundaries contribute to model fitting and minimizing the loss function. The loss function is similar to the loss functions of Ridge and Lasso regressions.](https://towardsdatascience.com/wp-content/uploads/2020/09/1sV3ctDOvNAUSIRmUnwKknA.png)
Instantiating an SVR is just as easy as setting up a Linear Regression, do make sure you set the kernel to ‘linear’. After fitting the model, we evaluate again on both the training and test set.
# returns
# train set RMSE:29.57783070266026, R2:0.49623314435506494
# test set RMSE:29.675150117440094, R2:0.49005151605390174
Note the RMSEs are much worse than our Linear Regression with clipped RUL. Let’s try and improve our model by scaling our features.
Scaling
SVRs work by comparing distances between feature vectors. But when features vary in range, the calculated distance gets dominated by the one which has bigger range. Let’s say one feature ranges between 10–11 and another between 1000–1100. They both vary 10% but the latter has a much larger absolute difference. Changes in its range will be weighed much more heavily by the SVR. To account for this problem, you can scale your features to have them all within the same range. This allows your SVRs to compare the relative distances and have variations approximately weighed equally. [3, 4]. Sklearns’ MinMaxScaler can be used to create a scaler fitted to our train data. The default settings create a scaler to scale our training features between 0–1. The scaler is then applied to both our X_train and X_test set. We fit and evaluate a new SVR model with the scaled data.
# returns
# train set RMSE:21.578263975067888, R2:0.7318795396979632
# test set RMSE:21.580480163289597, R2:0.730311354095216
Note the test RMSE of 21.58 is already an improvement upon our linear regression with clipped RUL, which had an RMSE of 21.90. Next, we’ll apply some feature engineering to try and improve our predictions even further.
Feature engineering
A very useful feature engineering technique is creating polynomial combinations of your features, these may reveal patterns in your data which aren’t obvious from the original features. Say we want to create polynomial features to the second degree of s_2
and s_3
, the result would be [1, s_2, s_3, s_2², s_3², s_2*s_3]
.
Applying this technique to all sensors in our current dataset increases the feature space from 14 to 120 features.
# returns
# (20631, 14)
# (20631, 120)
After engineering the new features, we fit a new model.
# returns
# train set RMSE:19.716789731130874, R2:0.7761436785704136
# test set RMSE:20.585402508370592, R2:0.75460868821153
Note the test set RMSE and Variance have improved yet again, indicating there is additional information to be gained by adding the polynomial features. I have also pondered logarithmic transformations, but sensor value ranges are not big enough for these transformations to be justified. The polynomial features did explode our feature space however, making our model a bit ‘bloated’ and increasing training times. Let’s see if we can tone it down a bit by keeping the most informative features while maintaining our score.
Feature selection
Using the model which included the engineered features, we can calculate which features have the highest contribution to model performance. To do so we use SelectFromModel, in which we pass our trained model and set prefit to True. We set the threshold for selecting ‘important’ features to ‘mean’, indicating the selected features will have a feature importance greater than the mean feature importance of the whole set. Getting the support returns a Boolean array indicating which features have a feature importance higher than the mean. We’ll use this to subset our features, only keeping the features for which ‘feature importance > mean feature importance’ equals to True.
# returns
# Original features:
Index(['s_2', 's_3', 's_4', 's_7', 's_8', 's_9', 's_11', 's_12', 's_13',
's_14', 's_15', 's_17', 's_20', 's_21'],
dtype='object')
# Best features:
['x0' 'x1' 'x2' 'x3' 'x5' 'x6' 'x7' 'x9' 'x10' 'x11' 'x12' 'x13' 'x2 x5'
'x2 x8' 'x2 x9' 'x3 x5' 'x3 x8' 'x3 x9' 'x4^2' 'x4 x6' 'x4 x7' 'x4 x8'
'x5^2' 'x5 x6' 'x5 x7' 'x5 x9' 'x5 x12' 'x5 x13' 'x6^2' 'x6 x8' 'x6 x9'
'x7 x8' 'x7 x9' 'x8^2' 'x9^2' 'x9 x12' 'x9 x13']
# shape: (37,)
A new SVR model is fitted and evaluated with the selected features.
# returns
# train set RMSE:19.746789101481127, R2:0.775461959316527
# test set RMSE:20.55613819605483, R2:0.7553058913450649
Note test RMSE and Variance have slightly improved while the number of features used by the model has been reduced from 120 to 37! The improvement is probably due to the model overfitting slightly less on the train set. We now have all the building blocks to train and select our final model.
Selecting our final model
For the final model we’ll use simplistic hyperparameter tuning on the train set to tune the value of epsilon. As explained earlier in this post, epsilon indicates the boundaries for the datapoints to be considered while minimizing the loss function.
# returns
# epsilon: 0.4 RMSE: 19.74772556660336 R2: 0.7754406619776462
# epsilon: 0.3 RMSE: 19.747580761069848 R2: 0.7754439552496148
# epsilon: 0.2 RMSE: 19.74660007817171 R2: 0.7754662580123992
# epsilon: 0.1 RMSE: 19.746789101481127 R2: 0.775461959316527
# epsilon: 0.05 RMSE: 19.746532456984006 R2: 0.7754677958176168
An epsilon of 0.2 seems to yield the best performance on the training set. Let’s retrain our model and check the final result.
# returns
# train set RMSE:19.74660007817171, R2:0.7754662580123992
# test set RMSE:20.54412482077374, R2:0.7555918150093489
The linear model with clipped RUL had a RMSE of 21.90, which was a 31% improvement over our baseline regression which had a RMSE of 31.95. Our final model utilizes a boundary tuned SVR, clipped RUL for training, feature scaling and the most contributing 2nd order polynomial features to reach a test RMSE of 20.54. This is an improvement of 6% over our linear model with clipped RUL and an overall 35.7% improvement over the baseline model!
In the end, this post shows the importance of framing your Data Science problem correctly. While the SVR is definitely an improvement over the linear regression, its improvement pales in comparison to our updated assumption of RUL. For the complete notebook you can check out my github repo here.
I would like to thank Maikel Grobbe and Wisse Smit for their inputs and reviewing my article. Next time we’ll delve into time series analysis and the RMSE of 20.54 will be the score to beat. If you have any tips, questions or remarks, please leave them in the comments below!
References: [1] F. O. Heimes, "Recurrent neural networks for remaining useful life estimation," 2008 International Conference on Prognostics and Health Management, Denver, CO, 2008, pp. 1–6, doi: 10.1109/PHM.2008.4711422. [2] Kleynhans, Tania & Montanaro, Matthew & Gerace, Aaron & Kanan, Christopher. (2017). Predicting Top-of-Atmosphere Thermal Radiance Using MERRA-2 Atmospheric Data with Deep Learning. Remote Sensing. 9. 1133. doi: 10.3390/rs9111133. [3] https://en.wikipedia.org/wiki/Feature_scaling [4] https://stats.stackexchange.com/questions/154224/when-using-svms-why-do-i-need-to-scale-the-features