Notes from Industry, MANUFACTURING DATA SCIENCE WITH PYTHON

Anomaly Detection in Manufacturing, Part 1: An Introduction

Work in manufacturing? Supercharge your business with data science and machine learning

Tim von Hahn
Towards Data Science
9 min readJun 8, 2021

--

A critical motor failed. Millions of dollars in lost revenue. Worst of all, it happened on my watch.

In reality, the failure wasn’t attributed to any one person — it was a system failure. But there were missed “signals” along the way, like the temperature probe with “spikes” weeks before. Unfortunately, I was not equipped with the tools to identify these problems. Reams of data was constantly collected, but to identify a single deviation was like finding a needle in a haystack….

You may have a similar story of equipment failure that has cost your business immense grief, both in terms of money and effort. Condition monitoring is the process of measuring the parameters of machines — such as temperatures, vibrations, pressures, etc. — in order to detect and prevent failures. However, yesterday’s implementation of condition monitoring is ill equipped to manage the deluge of data in today’s world.

Several parameters that can be used in condition monitoring.
Several parameters that can be used in condition monitoring. (Image by author)

To find those “needles in the haystack,” and improve productivity, traditional condition monitoring must be combined with data science and machine learning.

Fortunately, the sheer availability of data, and the clear line from theory to application, makes a compelling case for using data science and machine learning techniques in industrial environments. [1] A McKinsey study estimated that the appropriate use of data-driven techniques by process manufacturers “typically reduces machine downtime by 30 to 50 percent and increases machine life by 20 to 40 percent”.

Ultimately, it is here, at the intersection of traditional industry, data science, and machine learning, that we unlock incredible value.

This three part series will explore this application of data science and machine learning to a problem in manufacturing. In particular, we’ll learn to detect anomalies, during metal machining, using a variational autoencoder (VAE). Although this application is manufacturing specific, the principals can be used wherever anomaly detection is useful.

In Part 1 (this post), we’ll review what anomaly detection is. We’ll also be introduced to the UC Berkeley milling data set and do some exploratory data analysis — an important first step.

In Part 2, we’ll cover the theory of variational autoencoders. We’ll then build and train VAEs using TensorFlow — all the code is provided so you can easily follow along.

Finally, in Part 3, we’ll use the trained VAEs for anomaly detection. You’ll come to understand how the latent space can be used in anomaly detection. In addition, we’ll make some unique data visualizations to better understand the results.

By the end of this series, I hope you have a newfound appreciation for anomaly detection and how it can be used in a manufacturing environment. Maybe you’ll be inspired to exercise your data science skills in this fascinating domain?

Let’s start by understanding what anomaly detection is, and seeing how autoencoders can be used for anomaly detection.

Anomaly Detection and Autoencoders

The classic definition of an anomaly was given by Douglas Hawkins: “an [anomaly] is an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism.” [2] Sometimes, anomalies are clearly identified, and a data scientist can pick them out using straightforward methods. In reality, though, noise in the data makes anomaly detection difficult. Discriminating between the noise and the anomalies becomes the central challenge, as shown below.

An arrow represents the continuum between normal data, noise, and anomalies. Normal data is on the left, and anomalies are on the far right. Noise takes up a section in the middle. The challenge is differentiating between noise in the data and anomalies, and these are weak or strong outliers.
The continuum between normal data, noise, and anomalies. The challenge is differentiating between noise in the data and anomalies. (Image by author, inspired from Charu C. Aggarwal in Outlier Analysis)

There are many ways to perform anomaly detection. I highly recommend the book Outlier Analysis by Aggarwal for an excellent overview.

One method of anomaly detection uses an autoencoder. Autoencoders, as shown in the figure below, learn to reconstruct their inputs. However, the reconstruction will never be perfect. Feeding data into an autoencoder that is very different from what the autoencoder was trained on will produce large reconstruction errors. Feeding similar data will produce lower reconstruction errors.

Image of a very simple autoencoder. The autoencoder has one input layer consisting of five units, one output layer of five units, and a hidden layer of three units. The hidden units are commonly called codings, or latent variables. Ultimately, an autoencoder learns to reconstruct its inputs.
An autoencoder learns to reconstruct its inputs. Here, a simple autoencoder has one input layer, one output layer, and a hidden layer. The hidden units are commonly called codings, or latent variables. The hidden units are in the latent space. (Image by author)

The size of the reconstruction error can be used as a proxy for how abnormal the data is. A threshold can then be set, whereby data producing a reconstruction error above the threshold is considered an anomaly. This is called input space anomaly detection.

Inherently, the power of the autoencoder lies in its ability to learn in a self-supervised way. Yann Lecun described the strength of self-supervised learning in his Turing Award address: self-supervised learning allows models to “learn about the world without training it for a particular task.” This allows large swaths of data to be used in the training of the model — data that would not be available in supervised learning techniques.

The power of self-supervised learning makes it attractive for use in manufacturing and industrial environments where much of the data is not properly labeled, and/or it would be too costly to label. The use of an autoencoder for anomaly detection is one such instantiation of self-supervised learning.

Introducing the Metal Machining Data Set

We’ll further explore the concept of self-supervised learning, anomaly detection, and autoencoders as we build a variational autoencoder to detect abnormalities on tools during metal machining.

The metal machining data set, or milling data set, we’ll be using is from UC Berkeley. The data is hosted on the NASA Prognostics Center of Excellence web page and is freely available. In the following sections we’ll review what milling is and then explore the data.

What is Milling?

A milling tool is shown. The tool head is rapidly spinning and metal chips are seen flying away from the tool head as the tool cuts into the metal.
A milling tool in action. (Photo by Daniel Smyth on Unsplash)

In milling, a rotary cutter, like that in the picture above, removes material as it moves along a work piece. Most often, milling is performed on metal — it’s metal machining.

The picture below demonstrates a face milling procedure. The cutter is progressed forward while rotating. As the cutter rotates, the tool inserts “bite” into the metal and remove it.

Picture of a a milling tool. The tool has several tool inserts on it. As the tool rotates, and is pushed forward, the inserts cut into the metal.
A milling tool has several tool inserts on it. As the tool rotates, and is pushed forward, the inserts cut into the metal. (Image modified by author, Public Domain)

Over time, the tool inserts wear. Specifically, the flank of the tool wears, as shown below. In the UC Berkeley milling data set the flank wear (VB) is measured from cut to cut. This VB value will be used for labeling purposes.

Image showing a tool insert that is worn. The worn part is measured by a vertical distance, called flank wear.
Flank wear on a tool insert (perspective and front view). VB is the measure of flank wear. (Image by author)

Data Exploration

Note: I won’t cover all the code for the data exploration — follow along in the Colab notebook to see it all.

Data exploration is an important step when tackling any new data science problem. As such, we need to familiarize ourselves with the UC Berkeley milling data set before we start any sort of model building.

Where to begin? The first step is understanding how the data is structured. How is the data stored? In a database? In an array? Where is the meta-data (things like labels and time-stamps)?

Data Structure

The UC Berkeley milling data set is contained in a structured MATLAB array. We can load the .mat files using the scipy.io module and the loadmat function.

# load the data from the matlab file
m = sio.loadmat(folder_raw_data / 'mill.mat',struct_as_record=True)

The data is stored in a dictionary. Only the 'mill' key contains useful information.

# store the 'mill' data in a separate np array
data = m['mill']

We can see what the data array is made up of.

# store the field names in the data np array in a tuple, l
l = data.dtype.names
print('List of the field names:\n', l)
>>> List of the field names:
>>> ('case', 'run', 'VB', 'time', 'DOC', 'feed', 'material', 'smcAC', 'smcDC', 'vib_table', 'vib_spindle', 'AE_table', 'AE_spindle')

Meta-Data and Labels

The documentation included with the UC Berkeley milling data set highlights important meta-data. The data set is made of 16 cases of milling tools performing cuts in metal. Six cutting parameters were used in the creation of the data:

  • the metal type (either cast iron or steel, labelled as 1 or 2 in the data set, respectively)
  • the depth of cut (either 0.75 mm or 1.5 mm)
  • the feed rate (either 0.25 mm/rev or 0.5 mm/rev)

Each of the 16 cases is a combination of the cutting parameters (for example, case one has a depth of cut of 1.5 mm, a feed rate of 0.5 mm/rev, and is performed on cast iron).

The cases are made up of individual cuts from when the tool is new to degraded or worn. There are 167 cuts (called ‘runs’ in the documentation) amongst all 16 cases. Many of the cuts are accompanied by a measure of flank wear (VB). We’ll use this later to label the cuts as either healthy, degraded, or worn.

Finally, six signals were collected during each cut:

  • Acoustic emission (AE) signals from the spindle and table.
  • Vibration from the spindle and table.
  • AC/DC current from the spindle motor.

The signals were collected at 250 Hz and each cut had 9000 sampling points, for a total signal length of 36 seconds.

We will extract the meta-data and labels from the numpy array and store it as a pandas dataframe — we’ll call this dataframe df_labels since it contains the label information we’ll be interested in. This is how we create the dataframe:

Pandas dataframe showing the results of the labelling.

In the above table, from df_labels.head(), you can see that not all cuts are labelled with a flank wear (VB) value. Later, we’ll be setting categories for the tool health — either healthy, degraded, or worn (failed). We can reasonably estimate the tool health categories, based on nearby cuts that have wear values, for cuts without a flank wear (VB) value.

Data Visualization

Visualizing a new data set is a great way to grasp what is going on, and to detect any problems. I also love data visualization, so we’ll create a beautiful graphic using seaborn and Matplotlib.

There are only 167 cuts in this data set, which isn’t a huge amount. We can visually inspect each cut to find abnormalities. Fortunately, I’ve already done that for you…. Below is a highlight.

First, we’ll look at a fairly “normal” cut — cut number 167.

Simple plot of the signals from cut 167.

However, if you look at all the cuts, you’ll find that cuts 18 and 95 (index 17 and 94) are strange — they will need to be discarded before we start building our anomaly detection model.

Here is cut number 18:

Simple plot from cut 18. The cut data is clearly corrupted.

Here is cut number 95:

Simple plot from tool 95. The cut data is clearly corrupted.

Finally, we’ll create a plot that cleanly visualizes all six signals together (acoustic emissions, vibrations, and currents).

A elegant plot showing the six cut signals for cut number 146.
Cut number 146. (Image by author)

Conclusion

Data science and machine learning are a strong fit for manufacturing environments. To that end, we’ve reviewed the concept of anomaly detection using autoencoders. This self-supervised learning method can be useful in a manufacturing environment to help detect, and prevent, machinery failures.

In this post we also explained what metal machining is — in the context of milling — and we explored the UC Berkely milling data set. In Part 2, we will build a variational autoencoder and train it on the milling data.

References

[1] Economist. (2020). Businesses are finding ai hard to adopt. The Economist, ISSN 0013–0613.

[2] Hawkins, D. M. (1980). Identification of outliers (Vol. 11). London: Chapman and Hall.

This article originally appeared on tvhahn.com. In addition, the work is complimentary to research published in IJHM. The official GitHub repo is here.

Except where otherwise noted, this post and its contents is licensed under CC BY-SA 4.0 by the author.

--

--