Notes from Industry
Introduction
Given servo feedback information, such as torque, velocity, acceleration, and power, one can predict the likelihood of an issue with the servo. By knowing when the servo is likely to fail, one can reduce downtime and prevent potential damage to the system from a faulty servo. By applying a simple Gaussian probability density function to a set of trained features, the classification of an anomalous servo can be determined with little computational cost.
Background
Gaussian Distributions¹ with various mean and variance values for a single feature x are shown below:

The figure illustrates how the variance affects the probability density function magnitude and coverage along the feature axis. A higher variance value indicates the data are more spread out, and a lower variance value indicates the data are close in value, with a variance of zero meaning all of the data are identical.
Theory
The Gaussian probability density function that models the above illustrated curves is shown below:

The function represents the magnitude at a given feature value x. The mean (μ) and variance (σ) values are both determined prior to evaluating the function in this case. The mean is determined using the general form:

The variance is determined using the following:

The mean and variance are calculated for each feature that will be used to train and evaluate the model. It is important to vectorize the calculations as much as possible with the programming environment used.
The function will output the Gaussian Probability Density, which will vary significantly for different datasets containing various features. The output value needs to be mapped to a binary value indicating the classification of the servo status (anomalous or non-anomalous). This will be done by specifying a threshold value for the Probability Density Function.
When determining the threshold for an anomaly, an iterative method can be used for deciding on a value and observing the behavior of the model for various feature values. The selected threshold value depends on how sensitively it is desired for the model to flag anomalous sets of data. If there is a cross validation or test dataset, the precision, recall, F1-score, and integrated F-1 score³ can be evaluated to decide when the model performs optimally.
A good starting value is 2–3 standard deviations from the mean. The maximum probability density can be determined by driving the exponent’s value to 0 on Equation 1 and evaluating the equation with the vectorized learned variance values. Similarly, the density at 2 standard deviations can be determined by evaluating Equation 1 at μ+2σ.
Feature Selection
Features are determined by combining two servo feedback values that in theory should scale with each other. An example of a few feature pairs may include:
- Input Power and Output Torque
- Input Power and Output Velocity
- Input Power and Acceleration
- Input Torque and Output Velocity
The two feedback variables can be combined into a singular feature by dividing one by the other:

This allows for the model to classify anomalous values that may be classified as non-anomalous if using the standard feedback values as two separate features.
For the example shown below, a single feature is used composed of the torque and velocity:

Using a data sampling function, values of Torque and Velocity were saved while running various G-Code commands on the servo axis. It is important to sample as much data as possible with various servo parameters. In this case, the maximum velocity and acceleration for the servo were varied between 0.1–200.0 and 10.0–500.0 user units, respectively. Roughly 50,000 training samples were taken, and the distribution has been plotted below:

Notice that the data resembles a Gaussian Distribution, if that were not the case, then a transformation must be applied to Gaussianize it. This can be done in some cases by taking a logarithm of the dataset.
The mean, variance, and standard deviation for the training data is shown below:

These values are what the model learns and uses to apply on future values of servo feedback variables.
Using Equation 1, the Probability Density is calculated for each of the training data points, and the following curve is produced:

Procedure
In order to see the model in action, the servo will be configured as the X-axis for a 3D Gantry executing a simple constant velocity move.
The PLC will sample Torque and Velocity values every cycle (8 ms) and evaluate the servo as behaving anomalously or non-anomalous. The gaussian probability density and binary classification output from the model will be saved along with the data sampled at every scan.
It is expected that the model outputs ‘non-anomalous’ while the constant velocity move is running, although an anomalous output every 1000 could occur as a false-positive. For this reason, implementing additional logic to determine when and how to take action on the model’s output is required. For example, the number of anomalous occurrences in the trailing 1,000 samples can be used to trigger an event to notify the supervisor to check on the servo.
In order to simulate an anomaly, the X-Axis on the 3D Gantry will experience external resistance from an unknown source (don’t try this at home). This simulated anomaly can correlate to a damaged bearing providing mechanical resistance to the servo, and this is reflected in a deteriorated output velocity for the supplied Torque. It is expected that the model outputs ‘anomalous’ while the servo is experiencing these external forces.
Testing
The Gaussian Probability density as a function of time for the above mentioned test procedure is plotted below:

Notice the rapid decrease in magnitude around the 6 second mark; this is the exact time the servo experiences external resistance and continues until the 14 second mark when the external resistance is removed.
The binary classification output specified by a threshold of 0.001 is plotted below:

The binary classification reflects what is shown on the gaussian probability density function plot above. Due to the strict threshold, some of the values in the 6–14s range are not classified as anomalies.
A second set of plots have been generated to demonstrate the repeated loading and unloading of the external resistance/anomaly. Note that this second test is using a model trained with different data and as a result has different weights. The Probability density is plotted below:

Initially, the density values are scattered with a general horizontal trend. This is due to the torque slightly changing values in order to maintain constant velocity on the servo and this is expected behavior. Once the Loads are applied at roughly the 4.0, 8.0, and 12.0 second marks, the probability density drops off to zero. By observing the plot, it is very obvious that something has occurred with the servo.
Plotted below is the binary classification for the above plotted probability density with a threshold of 0.0026, which is 3 standard deviations from the mean.

The plot will confirm what was seen on the above probability density plot, as it is a function of it.
Conclusion
Implementing this lightweight Gaussian probability density model for anomaly detection can be a great indicator for when a servo needs maintenance. By running this model as a background process on a PLC, unnecessary downtime and potential damage to the system can be prevented.
Interested in this implementation? Contact me.
References
[1] Ng, A. Anomaly Detection [Lecture]. Machine Learning. Coursera. https://www.coursera.org/learn/machine-learning
[2] Wikipedia contributors. (2021, October 12). Normal distribution. In Wikipedia, The Free Encyclopedia. Retrieved 03:24, October 20, 2021, from https://en.wikipedia.org/w/index.php?title=Normal_distribution&oldid=1049556144
[3] Lebiedzinski, P. (2021, June 24). A single number metric for evaluating object detection models. Medium. Retrieved November 10, 2021, from https://towardsdatascience.com/a-single-number-metric-for-evaluating-object-detection-models-c97f4a98616d.