The world’s leading publication for data science, AI, and ML professionals.

Data-Driven Predictive Maintenance In a Nutshell

An Overview of Fundamental Concepts and Popular Algorithms

Photo by Brett Jordan on Unsplash
Photo by Brett Jordan on Unsplash

For complex systems such as airplanes, railways, power plants, maintenance is a big issue as it ensures the systems’ reliability and safety during their life cycles.

By tapping the power of advanced sensor capability, IoT technology, and data analytics algorithms, maintenance in the era of Industry 4.0 has experienced a rapid shift from "reactive" to "proactive": instead of performing maintenance only when failure has already occurred, the state-of-the-art strategy is to actively anticipate system degradation and schedule maintenance "just-in-time." This new type of maintenance is known as predictive maintenance (PdM).

In practice, PdM is typically achieved by first using sensors to monitor the system’s health state constantly. Subsequently, data analytics algorithms are employed to predict the system’s remaining useful life based on up-to-date measurements. Finally, a maintenance schedule is devised accordingly to maintain the system in its originally intended function.

At the core of PdM is prognostic techniques, which enable predicting the degradation trend of an in-service system given the real-time measured data. This is where Data Science shines: machine learning models are commonly built to identify the characteristics of the current health state of the system and to predict the remaining time until system failure occurs.

In this post, let’s take a close look at the commonly adopted data-driven models for performing prognostic tasks. This article is structured in the following way:

  • First, to set the stage, we will briefly review the major steps involved in predictive maintenance.
  • Second, we will classify systems into different categories based on their characteristics, and we will discuss popular Machine Learning models for each system category.
  • Finally, we will talk about the challenges faced by delivering reliable prognostic analysis.

Let’s get started!


1. Main Steps of Predictive Maintenance

Fig. 1 Main steps in predictive maintenance. (Image by Author)
Fig. 1 Main steps in predictive maintenance. (Image by Author)

PdM’s main steps include data acquisition, diagnostics, prognostics, and health management, as demonstrated in the infographic above.

1.1 Data Acquisition

The data acquisition step deals with collecting measurement data from the sensors and processing the raw signal to extract useful features that could indicate the system’s health state. This latter task is commonly known as feature engineering in data science.

To extract features, signal processing techniques are usually employed to transform raw data into features in a different domain (e.g., time, frequency, time-frequency). Since PdM mainly encounters non-stationary signals, time-frequency analysis tools are handy to extract features for diagnostic and prognostic purposes. Under this category, short-time Fourier transform, wavelet package decomposition, empirical mode decomposition, and Hilbert-Huang transform are the most popular approaches.

An additional step after feature extraction is feature reduction. This is the case as the extracted features are usually too numerous to be exploited in practice. Popular dimensionality reduction methods, such as principal components analysis (PCA), kernel-PCA, Isomap, etc., are usually employed to eliminate redundant features.

1.2 Diagnostics

The diagnostics step deals with fault detection and failure mode identification based on the extracted feature values.

Fault diagnostics are usually formulated as a classification problem. As a result, popular classification methods, such as k-nearest neighbors, support vector machines, decision trees, random forest, are widely adopted to predict the system health state labels given the observed feature values. A simple illustration of applying the decision tree model to classify the system failure mode is given below.

Fig. 2 Using decision tree to perform fault diagnostics. (Image by Author)
Fig. 2 Using decision tree to perform fault diagnostics. (Image by Author)

1.3 Prognostics

The next step of PdM is prognostics. Here, the goal is to predict the monitored system’s future state and estimate the system’s remaining useful life (RUL), i.e., how long it will take until system failure occurs.

Fig. 3 An illustration of prognostic analysis. (Image by Author)
Fig. 3 An illustration of prognostic analysis. (Image by Author)

Prognostics is the key technology that drives intelligent PdM. Since it predicts the time at which a system will no longer perform its intended function, it provides users with the opportunities to mitigate failure risk while extending the system’s useful life.

Naturally, data-driven approaches are heavily investigated to estimate the RUL. As a result, an array of machine learning strategies have been proposed for various application cases. In section 3, we will review some of the frequently used approaches for RUL prediction.

1.4 Health Management

After detecting the system fault and estimating the system’s remaining useful life, it is time to take some actions based on the obtained results.

The main goal of the health management step is to manage the maintenance and logistic support in an optimal manner, i.e., achieving increased availability, reliability, and safety, as well as reduced maintenance and logistics cost. Health management is usually formulated as a constrained optimization problem, where global optimization algorithms are adopted to derive the best maintenance scheduling.


2. Classification of System Characteristics

Prognostic methods generally differ according to the type of system considered. Therefore, it would be a good idea to first classify various systems based on their characteristics before discussing specific methods under individual categories.

2.1 System Characteristics

In general, we can categorize a system based on whether its state is directly observable or indirectly observable, as well as whether its state is modeled as a discrete process or a continuous process. The following decision tree illustrates this classification scheme.

Fig. 4 Taxonomy of prognostic methods. (Image by Author)
Fig. 4 Taxonomy of prognostic methods. (Image by Author)

2.2 Direct or Indirect Observable?

The first criterion is the observability of the system’s state.

On some occasions, the monitored data can directly describe the system’s underlying state, such as wear and crack size. For those cases, an estimation of RUL can be effectively formulated as a time series prediction problem.

However, on many other occasions, the monitored data can only indirectly indicate the system’s underlying state, such as vibration and oil-based monitoring for rotatory machines. Under those situations, we can frame the prognostic problem as solving two coupled equations:

Fig. 5 We can frame prognostic problems as solving two coupled equations. Here, index k indicates the time stand. (Image by Author)
Fig. 5 We can frame prognostic problems as solving two coupled equations. Here, index k indicates the time stand. (Image by Author)

The measurement equation bridge the gap between the measured feature values and the internal system state. Here, h(.) denotes the measurement model, and ν represents the measurement noise.

Meanwhile, we have a state evolution equation to describe the system degradation process. Here, f(.) denotes the degradation model, and w represents the model uncertainty. This uncertainty term is induced because the degradation model can only partially reflect the true physical process.

2.3 Discrete or Continuous State Evolution?

The second criterion is based on how we model the state evolution of the system.

For some cases, we assume the system evolves on a finite state space Φ = {0, 1, …, N}, where 0 corresponds to the perfect healthy state and N represents the failed state. Those discrete states can be derived based on meaningful operational conditions in practice, such as "Good," "Minor defects only," "Maintenance required," or they can be derived from applying unsupervised clustering techniques to the training data.

For other cases, it may make more sense to model the system evolution as a continuous process. For example, the battery internal resistance, which commonly served as a health indicator for the lithium-ion batteries, degrades continuously when going through a sequence of charge-discharge cycles.


3. Prognostic algorithms

In this section, we will discuss some of the commonly employed machine learning methods for the prognostics purpose, i.e., predicting the system’s remaining useful life (RUL). We organize our discussion according to the categories introduced in the previous section.

3.1 Markov Models

Markov models are useful for systems whose states are directly observable and evolves in discrete manners.

In general, Markov methods model the system degradation as a stochastic process that jump between a finite set of states Φ = {0, 1, …, N}. Here, 0 corresponds to the perfect healthy state, and N represents the failed state. The sequence of the states constitutes a Markov chain.

The primary assumption of the Markov model is that the future system state depends only on the current system state. This property is also known as the Markov property.

Under the Markov chain modeling framework, the RUL can be defined as the amount of time the degradation process will take to transit from the current state to the failure state N for the first time. This is also known as the first passage time (FPT).

Of course, to calculate RUL using Markov methods, we need to know the number of states and the transition probability matrix A between states, where Aᵢⱼ denoting the transition probability from a state i to j. In practice, they are estimated from the training data. For determining the number of states, a K-mean clustering algorithm is usually employed.

3.2 Time Series Forecasting

Time series forecasting methods are useful for systems whose states are directly observable and evolves in continuous manners. In those settings, the RUL estimation is essentially the estimation of the measured time series data to reach a predefined threshold.

A variety of approaches exist that can build time series forecasting models. For example, we have the exponential smoothing method, which, in its basic form, forecasts the new observations as a weighted average of past observations, with the weights decreasing exponentially back in time.

Also, we have the ARIMA models. Here, ARIMA stands for the autoregressive integrated moving average. ARIMA combines an autoregressive model, which regresses the new observation value on the past observation values, and a moving average model, which models the error term as a linear combination of error terms occurring contemporaneously and at various times in the past. Finally, ARIMA employs a differencing step (corresponding to the "integrated" part of the model) to eliminate the trend’s non-stationarity.

Besides classical time series forecasting methods, we also have neural networks at our disposal. Recurrent neural networks, in particular, Long-Short Term Memories (LSTM) models, are gaining popularity in recent years for forecasting purposes.

3.3 Hidden Markov Models

Hidden Markov models (HMM) are useful for systems whose states can only be indirectly observed and evolves in discrete manners.

Conceptually, an HMM consists of two stochastic processes, an observable process _Yₙ, which accounts for observations obtained from sensor measurements, and a system degradation process Z_ₙ, whose states are unobservable and evolves according to a Markov chain on a finite state space.

Fig. 6 A sketch of hidden Markov models. (Image by Author)
Fig. 6 A sketch of hidden Markov models. (Image by Author)

Generally, a conditional probability P(_Y_ₙ | _Z_ₙ = i) is employed to describe the relationship between those two processes.

The fundamental HMM can only handle discrete observations, i.e., the observable process _Yₙ evolves on a finite state space. However, in practice, the observations Y_ₙ are often continuous. Under those situations, a mixture of Gaussians hidden Markov model (MoG-HMM) is usually employed to handle continuous observations. There, a mixture of Gaussian distributions is used to approximate the probability density of P(_Y_ₙ | _Z_ₙ = i).

3.4 Stochastic Filtering Methods

For systems whose states can only be indirectly observed and evolves in continuous manners, stochastic filtering methods are good choices to predict RUL.

Stochastic filtering methods emerge from a larger field of study known as data assimilation. There, the goal is to estimate the probability distribution of the system’s state by assimilating information from both observations and model predictions.

Stochastic filtering methods use Bayesian learning to iteratively update the system state and parameters that govern the state evolution as the new measurements become available. Once the state and parameters are estimated, we can predict the future system states using the evolution equations. An overall workflow is sketched below.

Fig. 7 The workflow of using stochastic filtering methods to predict RUL. (Image By Author)
Fig. 7 The workflow of using stochastic filtering methods to predict RUL. (Image By Author)

For the measurement equation, the measurement model h(.) is usually a data-driven model derived from the training data. For the state evolution equation, f(.) can be either derived from physical principles or supervised learning methods, depending on the availability of the physical degradation knowledge and the relevant training data.

Regarding the stochastic filtering techniques for state estimation, the most commonly used methods are based on the Kalman filter, which has closed-form solutions and is very fast to evaluate. However, it can only handle linear f(.) with Gaussian noise terms. To overcome this limitation, more advanced variants were proposed, including unscented Kalman filter, ensemble Kalman filter, etc.

The most general filtering technique is particle filtering, also known as the sequential Monte Carlo (SMC) method. This type of filtering approach uses a set of weighted particles (also called samples) to represent the probability distribution of the system states and evolution parameters. Once new observations are available, the weights of those particles are updated according to the Bayes rule. Owing to its simulation nature, particle filtering can handle non-linearity and non-Gaussian in real-world applications.


4. Challenges In Prognostics

Despite the rapid advances in prognostic algorithms, performing a reliable prognostic analysis is not always easy in reality. There are a number of challenges that may prevent us from achieving the goal:

  • Sensor reliability and failures, as sensors may operate in a hostile environment;
  • Feature extraction, as it is a non-trivial task to isolate features that are related to the degradation process for complex systems;
  • Data availability, as employing machine learning techniques for prognostics usually requires a large amount of training data (especially run-to-failure data), which are not readily accessible from in-service systems due to time and cost.

Besides the above-mentioned problems, uncertainty encountered in the prognostics constitutes another major challenge for obtaining a reliable RUL estimation.

Prognostic uncertainties may originate from:

  • Input data: sensor data may contain a significant level of noise. Also, environmental and operational loading conditions are constantly changing.
  • Model: due to limited training data, the constructed data-driven models may fail to capture the true system degradation process accurately, thus producing modeling errors and uncertainties.

Since these uncertainties can lead to significant deviation of prognostics results from the actual situation, developing a systematic uncertainty management framework is crucial for delivering meaningful RUL predictions. To learn more about how to manage uncertainties associated with model-based predictions, please check out my previous article here:

Uncertainty Quantification Explained


5. Takeaways

In this article, we’ve talked about the fundamental concepts of Predictive Maintenance and introduced some of the most popular algorithms for estimating the system’s remaining useful life. In addition, we’ve discussed a number of challenges in obtaining reliable prognostic results.

The key takeaways of this article include:

  • Predictive maintenance (PdM) consists of data acquisition, diagnostics, prognostics, and health management;
  • Prognostics is the key technology that enables intelligent PdM;
  • The main task of prognostics is to predict the system’s remaining useful life (RUL);
  • Based on the characteristics of the investigated system (directly/indirectly observable, discrete/continuous state evolution), Markov models, time series forecasting methods, hidden Markov methods, and stochastic filtering approaches are commonly adopted to estimate the RUL;
  • A reliable prognostic analysis is not easy to deliver due to the challenges presented in sensor reliability and failures, feature extraction, data availability, and prognostic uncertainties.

Reference

[1] N. H. Kim, 2017, Prognostics and Health Management of Engineering Systems. [2] L. Liao, F. Kottig, 2014, Review of Hybrid Prognostics Approaches for Remaining Useful Life Prediction of Engineered Systems, and an Application to Battery Life Prediction, IEEE Transactions on Reliability. [3] X. Si, W. Wang, C. Hu, D. Zhou, 2011, Remaining useful life estimation – A review on statistical data-driven approaches, European Journal of Operational Research. [4] S. Sankararaman, K. Goebel, 2015, Uncertainty in Prognostics and Systems Health Management, International Journal of Prognostics and Health Management.

About the Author

I’m a Ph.D. researcher working on uncertainty quantification and reliability analysis for aerospace applications. Statistics and data science form the core of my daily work. I love sharing what I’ve learned in the fascinating world of statistics. Check my previous posts to find out more and connect with me on Medium and Linkedin.


Related Articles