Using Machine Learning to Classify Server Incidents

A step-by-step on how Machine Learning can be used to classify server incidents

Published in

Towards Data Science

5 min readJan 2, 2021

Introduction

When we talk about incident management, automation of the incident resolution is a must-have for every company, but to have automation that works in the right incident and with the right information it is necessary to retrieve information from the server incident automatically, where most of its content is based on text, i.e., the incident description, and then classify (we call it “match”) which already available automation will probably solve the incident.

The common way used to retrieve information from the incidents that do not require deep programming skills is using a regular expression, but if you already worked with it you know how messed it can be, and the minimum modification in the content of the incident can cause the regex to fail.

In this post, we describe how we are using machine learning techniques to improve that match.

Photo by Christina @ wocintechchat.com on Unsplash

Data Preparation

The data used to train the model is based on incidents that were resolved by one of the available automation in the last six months, which is around 100k incidents from more than 180 clients in 9 different countries. We restrict the data to only 6 months because we continuously deploy new automation and fix some regular expression issues, so to prevent wrong/old data to be used, the smaller interval should be enough.

The raw data set that we retrieve from our data lake has 14 columns, but we will focus only on the necessary ones to create the model.

automation_name: Name of the automation that was executed in this incident (target of our classification)
component: High-level name of the component affected by the issue in the incident
sub component: Specific name of the component affected by the issue in the incident
summary: Quick description of the incident

Example of data contained in the data set

In this particular case, as component and sub component are additional to summary information, we decided to concatenate these three fields creating a “feature” field, this process will make the next steps easier, besides that, we set all content to lowercase.

In some countries we have incidents with more than one language, English and Spanish for example, to solve that we used two libs, the langdetect to check the language of the incident, we adopted English as our main language, and googletrans to translate the incident to English if necessary. This process can take a few hours depending on your data set size, we tried to filter only the necessary text to translate to English.

To finish this step, we separate the data into feature and class, the automation_name is the class we want to predict and the feature is the result of the processing we did above.

Modeling

Having the two fields ready, we can finally start the machine learning phase itself. To make it simple to run our model, we create a pipeline, the pipeline has three main components:

from sklearn.pipeline import Pipelinepipeline = Pipeline(steps=[
     ('bow', CountVectorizer(analyzer=proc_text)),
     ('tfidf', TfidfTransformer()),
     ('classifier', model)
])

CountVectorizer: This function will tokenize our data, the result will be a sparse representation of the count of each word

One important step in this phase is to remove the stop words and punctuations, this two components can create a complexity not necessary for our case, so using the nltk library and the function below attached to CountVectorizer, we get rid of them.

import string
import nltk
# Necessary to install stopwords, only in the first time
#nltk.download()
from nltk.corpus import stopwords
sw = set(stopwords.words('english'))def proc_text(text):
 string_w_no_punc = [char for char in text if char not in string.punctuation]
 
 string_w_no_punc = ‘’.join(string_w_no_punc)
 
 return [word for word in string_w_no_punc.split() if word.lower() not in sw]

TfidfTransformer: From the sparse matrix returned from CountVectorizer, this function will apply the TF-IDF (Term Frequency — Inverse Document-Frequency) representation to it. The TF-IDF goal is to measure the impact of a word in one document (TF) and all the documents (IDF).
Model: The last component of the pipeline is to apply the classifier.

After the pipeline, it is business as usual, we separated the data into training and test and will evaluate the results of each classifier.

Evaluation

In this particular case, we want a balanced metric of Precision and Recall, as at this point, none of them can cause more trouble than the other. Having this in mind, we chose to evaluate the F1-Score and Macro Average, we tried a few classifiers and you can check the results below:

The model that best fit our data was XGBoost, but there a lot of more things you can do to improve these numbers, like hyperparameter optimization, upsampling the minority classes, and a few others, but I will let you try this by your own. :)

To use this in production, we were required to have some guarantees that the classifier predicts the automation with a probability above a certain threshold, to do this we used the predict_proba from the sklearn lib, this function returns the probability that the classifier returned for each class, having this and the threshold, which we set as 97%, we were able to implement and monitor the classifier in production without worries. Here is the snippet that we used for that:

pred_proba = pipeline.predict_proba(X)
pred = pipeline.predict(X)confidence = []
for item in pred_proba:
    confidence.append(item.max())df["predicted_automation"] = pred
df["confidence"] = confidence
df["applicable"] = df_test["confidence"].apply(lambda x: "OK" if x > 0.97 else "NOK")

Hope this article will help you to understand how to work with text data using Machine Learning and how we found an opportunity and applied this in our day to day job willing to improve our client's satisfaction.