
Quality Control is an important step in every production system. A lot of business investments aim to reinforce this process to grant higher-performance products. In the last years, Machine Learning solutions play a key role in this program of investments for their ability to easily adapt in every contest and for the great results achieved.
In this article, I present an AI solution for Quality Control in a standard production unit, in the form of a classification problem. Following a very interesting approach, I try to achieve the best possible performance, giving a visual explanation of the results and taking into account the useful human insights.
I want to underline this latest topic because human insights are often underestimated in Machine Learning! It’s not a surprise that they permit us to achieve the best performances and to adopt the smartest solutions.
THE DATASET
I took the dataset for the analysis from the faithful UCI repository (Steel Plates Faults Data Set). The data description is very poor but it doesn’t matter because it’s easy to understand… We have a dataset containing the meta information of steel plates like luminosity, perimeter, edge, thickness, area, type of steel and so on (27 independent variables in total).
We can imagine to manage a factory that works steel and produces steel plates in the final step of the production system to sell them in the wholesale market. Our aim is to maximize the efficiency of the production system, trying to identify the possible types of steel plate faults (7 in total) only considering the metadata of the products. In this way, we will be able to identify the fallacies of the production system and accordingly react.
With 1941 samples at our disposal, the distribution of the total faults is unbalanced in favor of ‘Other Faults‘ class:
df = pd.read_csv('./Steel_Plates_Faults.csv')
label = df.Fault
df = df.drop('Fault',axis=1)
label.value_counts().plot.pie(figsize=(6,6))

FIRST MODEL
We are in a hurry and want to get immediately some results. So we take all our data and fit a Gradient Boosting.
X_train, X_test, y_train, y_test = train_test_split(df, label, random_state=42, test_size=0.2)
gbc = GradientBoostingClassifier(n_estimators=500)
gbc.fit(X_train, y_train)
We are able to achieve an overall ACCURACY of 0.807%.
cnf_matrix = confusion_matrix(y_test, gbc.predict(X_test))
plot_confusion_matrix(cnf_matrix, classes=np.unique(label), title="Confusion matrix")
plt.show()
If we plot the confusion matrix it’s possible to see that our algorithm isn’t able to classify well the Other_faults class. This first result imposes us to slow down and think a little bit.

Personally, the best way to investigate a phenomenon is to make a plot. For this reason, I took all the variables that we used to fit our Gradient Boosting and I plotted them. To do this I reduce the starting dimensions (27 variables) to only 2 dimensions fitting a TSNE.
scaler = StandardScaler()
scaler.fit(X_train.astype('float64'))
tsne = TSNE(n_components=2, random_state=42, n_iter=300, perplexity=5)
T = tsne.fit_transform(scaler.transform(df.astype('float64')))
plt.figure(figsize=(16,9))
colors = {0:'red', 1:'blue', 2:'green', 3:'pink', 4:'black', 5:'orange', 6:'cyan'}
plt.scatter(T.T[0], T.T[1], c=[colors[i] for i in LabelEncoder().fit_transform(label)])

Here it is a clear and beautiful explanation of our results! We (and also the algorithm) aren’t able to identify a clear separation from the Other_Faults (pink dots) to the remaining classes. In this way the recall values from the confusion matrix for the Other_Faults make sense. Other_Faults is a noisy class and we have to take into account this aspect in our analysis.
KEY POINT
What does it mean that Other_Faults is a noisy class? To answer this question we take advantage of notorious and underestimated human insights.
Assuming that humans made up the quality check in a production system today. If these people inspect a steel plate with a notable bump or a stain on its surface this piece is easy to label. But if these people inspect a plate with a bump and a stain on the surface at the same time label it is not so easy! (_if you are in doubt put it in OtherFaults).
This is only one example, but personally, it clarifies our situation: Other_Faults class needs to be treated with care because it incorporates a lot of undefined cases of quality checks.
With this precious consideration in mind, we can proceed safely! As above I try to plot all the steel plates in our dataset, taking into consideration all the variables BUT not including the plates which belong to Other_Faults (pink dots).
tsne = TSNE(n_components=2, random_state=42, n_iter=300, perplexity=5)
T = tsne.fit_transform(scaler.transform(df[label != 'Other_Faults'].astype('float64')))
plt.figure(figsize=(16,9))
colors = {0:'red', 1:'blue', 2:'green', 3:'black', 4:'orange', 5:'cyan'}
plt.scatter(T.T[0], T.T[1], c=[colors[i] for i in LabelEncoder().fit_transform(label[label != 'Other_Faults'])])

Pretty good! We have removed the noise caused by Other_Faults and now the classes are well separated.
SECOND MODEL
At this point, we try to make a model which not takes into account the Other_Faults class. We fit a Gradient Boosting as above.
X_train2, y_train2 = X_train[y_train != 'Other_Faults'].copy(), y_train[y_train != 'Other_Faults'].copy()
X_test2, y_test2 = X_test[y_test != 'Other_Faults'].copy(), y_test[y_test != 'Other_Faults'].copy()
gbc2 = GradientBoostingClassifier(n_estimators=500)
gbc2.fit(X_train2, y_train2)
Now the ACCURACY is 0.909%, improving by 10 percentage points.
Certainly, this is a good result and confirms the goodness of our reasonings but this second model reproduces an unrealistic scenario. In this way, we are imposing that Other_Faults class doesn’t exist and that all the faults are easy to distinguish and to label. With our first model, we have proved that this is not our case. So we need a way to translate the uncertainty, that appears when people try to classify an ambiguous steel plate, in machine learning language.
IMPOSE A THRESHOLD
I have encoded this uncertainty imposing a threshold on each class on our final predictions. To build this threshold I have made the predictions with our second model on the Other_Faults samples and I’ve stored it, maintaining the separation for each predicted class (as shown below).
def predict(feature, model, threshold_map=None):
confidence = model.predict_proba(feature).max()
label = model.predict(feature)[0]
if threshold_map and label in threshold_map:
if confidence >= threshold_map[label]:
return {"label": label, "confidence": confidence}
else:
return {"label": "OTHERS", "confidence": confidence}
elif threshold_map == None:
return {"label": label, "confidence": confidence}
else:
print(label, 'not in threshold map')
pred_lab = []
pred_conf = []
for row in tqdm.tqdm(X_train[y_train == 'Other_Faults'].values):
pred = predict([row], gbc2)
pred_lab.append(pred['label'])
pred_conf.append(pred['confidence'])
other_pred = pd.DataFrame({'label':pred_lab, 'pred':pred_conf})
diz_score = other_pred.groupby('label')['pred'].apply(list).to_dict()
plt.figure(figsize=(18,5))
plt.boxplot(diz_score.values(), labels=diz_score.keys())
plt.grid(False); plt.show()

Next, I’ve calculated a mobile threshold on each predicted class: I’ve adopted the 0.30 percentile in every class (red squares) calculated on the score distributions.
threshold_p = {}
for lab in diz_score.keys():
threshold_p[lab] = np.percentile(diz_score[lab],30)
plt.boxplot(list(diz_score.values()), labels=list(diz_score.keys()))
plt.plot(range(1,len(threshold_p.keys())+1), list(threshold_p.values()), 'rs')
plt.show()

Practically, we utilize this threshold to say if a steel plate belongs with certainty to a given class of failures. If our prediction is below the threshold, we have not so much confidence to classify it and so we label it as an Other_Faults.
Adopting this technique we are able to achieve an ACCURACY of 0.861% (in test data without Other_Faults). If we will increase the threshold, we will lose accuracy points, but we will get higher precision, and so on.

Regarding the Other_Faults class, we are assuming that it exists in the form of an ‘indecision class’, which contains all the samples classified by the model with low confidence. At the same time, we are assuming that all the samples of the original Other_Faults class belong to the class pointed by the model if the confidence is higher than the threshold (we trust this).
In the end, if we plot again our original data adopting our resizing of Other_Faults class, we can see a noise reduction (pink dots concentration).
final_pred = []
for row in tqdm.tqdm(X_test2.values):
final_pred.append(
predict([row], gbc2, threshold_map=threshold_p)["label"]
)
tsne = TSNE(n_components=2, random_state=42, n_iter=300, perplexity=5)
T = tsne.fit_transform(scaler.transform(pd.concat([X_train,X_test]).astype('float64')))
plt.figure(figsize=(16,9))
colors = {0:'red', 1:'blue', 2:'green', 3:'pink', 4:'black', 5:'orange', 6:'cyan'}
plt.scatter(T.T[0], T.T[1], c=[colors[i] for i in LabelEncoder().fit_transform(other_final_pred)])

SUMMARY
In this post, I proposed a workflow for fault classification in Quality Control. I received as input some samples of steel plates and I started to analyze them in order to correctly classify faults. After the first step, I noticed some ambiguous behaviors in the data structure and so I started to investigate… Based on my human insight, I suggested a new vision of the problem, I tried to solve it and I proposed my personal solution.
Keep in touch: Linkedin