The world’s leading publication for data science, AI, and ML professionals.

Anomaly Detection is in the Eye of the Beholder

What a deck of playing cards reveals about detecting outliers

Image by Author
Image by Author

In cybersecurity, Anomaly Detection is focused on finding unusual events that might be considered cyber-attacks. The promise of anomaly detection is that the algorithm will detect attacks that have never been seen before. Some of the problems with network data are that there is so much of it, the number of attacks is low, and attacks evolve every day.

Many intrusion detection applications of machine learning rely on supervised learning. Supervised machine learning is good at detecting known attacks. With proper fitting, a supervised machine learning algorithm may even be able to find some novel attacks. But anomaly detection takes a fresh look at the data without the predefined attack signatures.

Anomalies Are Normal

When processing network traffic data, an anomaly detection algorithm may find events that are truly anomalies, but they might not be the events you were looking for. Large computer networks have a rhythm. They have processes that run periodically. They generally have the same users who do the same thing every day. But network data is far from regular. There are events that happen on networks that interrupt the normal routine. There are operational events that result from system errors, defects, or misconfigurations. There are system changes that are deployed on servers to add software features and to patch security vulnerabilities. There are new users who start working and others whose jobs change and who find themselves with new responsibilities. With all of these changes occurring, which events are really anomalies?

Detecting cyber-attacks in a changing network ecosystem with an anomaly detection algorithm is very challenging, since an anomaly may have nothing to do with exploit attempts. Although it seems contradictory, anomalies are normal.

It is important to realize that the anomalies you find may not be the anomalies you are actually looking for. This applies to intrusion detection as well as to many other areas of unsupervised learning. Just because a piece of data is different does not mean it is bad. In fraud detection, you might look at the buying patterns of customers and see unusual purchases in a store where the customer has never done business. It may be fraudulent activity, or it may just be that the customer decided to do something random, different, or outside of normal. This is one of the reasons why anomaly detectors have such high false positive rates.

The anomalies you find may not be the anomalies you are actually looking for.

Do not get discouraged and revert to supervised learning. There are ways to improve detection of the events you really want to find. How can you get better results with your algorithm?

No Single Right Way to Cluster

Anomaly detection is similar to a Clustering problem. Clustering is also an example of unsupervised learning. In clustering, the goal is to group items together that are most like each other and least like the items in the other groups. A known problem with clustering is that there is no single algorithm that clusters all data successfully. The challenge with clustering is that the groups may be similar and dissimilar for very different reasons. Clusters can be measured using density functions or distance functions, but even then, some groupings in data cannot easily be found. For example, certain contiguous shapes in data may not cluster well without a specialized clustering algorithm.

This is why Estivill-Castro pronounced, "Clustering is in the eye of the beholder." [1] He argued that clustering is a mathematical definition of how different researchers characterize a cluster. For each inductive principle, there are different clustering algorithms to suit the intentions of the researcher.

What a Deck of Cards Reveals

To illustrate with a data set that you are probably familiar with, consider a deck of playing cards. A standard deck of playing cards has 52 cards. Each card will be a record in our data set. Each card has a set of features, such as the suit, the face value of the card, and the color. Did you know that each card has a Unicode value? We can include that as a feature as well. Most decks also add two joker cards. The joker card is something of an anomaly, since it does not have a suit, or specific color, but it does have a face value and a Unicode value.

Given this deck of cards, if you were asked to cluster them into reasonable groups, how would you organize them? Remember, in clustering, your goal is to select groupings that make each card most similar to those in its own cluster and most dissimilar to those in other clusters.

You might group the cards by suit. You could put all of the diamonds together, all of the hearts together, all of the clubs together, and all of the spades together. The result is 13 cards in each suit. What about the jokers? They are something of an anomaly, so maybe they should go into their own group.

Image by Author
Image by Author

You might group the cards into clusters by color, with all of the red cards in one group and all of the black cards in another group. Thus 26 cards in each cluster, but the jokers are clearly an anomaly, since they do not fall into either color group.

Image by Author
Image by Author

Finally, you might group the cards by face value. You could include all of the aces in one group, all of the twos in another group, and so on. Since the jokers are similar in appearance to each other, you again might include them in their own cluster. In this case, the jokers may not appear to be anomalies, since they have a face value.

Image by Author
Image by Author

Which clustering solution is correct? Each one is correct. It depends on your goal and how you approach the problem. If your objective was to group the cards into as few groups as possible, then color seems like the best feature. If you wanted to group into as many groups as possible, then grouping by face value makes the most sense.

The solution you get depends on the features you use to group the data.

It would be great if unsupervised machine learning algorithms were magical. If they were, you could give the algorithm a set of data without any initial analysis and get the correct solution. But as you can see from the playing card example, the solution you get depends on the features you use to group the data.

For this discussion, it is also important to consider that some groupings clearly revealed the jokers as anomalies, and other solutions did not. From this simple example, it is clear that anomaly detection is also in the eye of the beholder.

Feature Selection

When applying anomaly detection to intrusion detection, or any other application, the most important step is to select the most appropriate features. If you want to get magical results, you could use an approach like bagging to test random features, but unfocused feature selection may not give you the solution you were really looking for. Instead, think about what an anomaly means in the context of your network and your data set.

When selecting features, you may find very natural features that exist in the data set. In the playing card example, the color, suit, and face value are straightforward. But what if the basic features do not help to point out anomalies?

Perhaps you want to see what is abnormal for certain days of the week or hours of the day. Extracting the day of week and hour of day from a date feature in network data will allow an algorithm to find some anomalous activity.

Perhaps you want to see patterns over time. For example, in network data, you might want to find patterns of users logging into certain systems in normal or abnormal patterns. The regular features do not provide this context, so you might need to create a time series or create N-grams from the original data.

Define Anomalies with Use Cases

But what if you are looking for several different types of anomalies? For example, what if you want both to distinguish between a user logging into a different system than normal and to detect an event where a user logs in at a different time of day? Perhaps you are also interested in patterns of network behavior between systems on your network.

One solution is to create use cases for each type of anomaly you want to detect. In Cybersecurity, these are often called misuse cases, since they are not about how computer systems should work, but how they might be attacked. Here are some sample use cases you might use for intrusion detection:

  • User logging in at an unusual time of day
  • Unusual volume of network traffic between two computer systems
  • Abnormal status message
  • System account logging into an unusual computer
  • Irregular pattern of logons

After your use cases are defined, you can select the features that best help to identify anomalies for each use case. Then, running the anomaly detection algorithm separately for each use case, you can focus on the anomalies in which you are most interested.

With the results of different anomaly detectors, each based on a different feature set, it is helpful to have some method or formula for combining the results of the use cases. For example, if three of four use cases indicate that an event is an anomaly, it might get a higher score than an event that triggers only one use case.

Although combining results can be helpful for prioritizing anomalies, it is also good to present the score for each use case. Presenting this information to the end-user provides some explainability. The user will understand why an alert is considered an anomaly based on the use case that triggered it.

You may also find, through experimentation, that some of the use cases you began with are not worthwhile and only create noise. In these cases, it is good to eliminate these use cases, since they do not help find the anomalies you are trying to detect. This will help reduce your false positive rate.

Conclusion

Anomaly detection using Machine Learning is not magic. It takes planning and analysis to find the right approach and get the most out of your algorithm.

Since anomaly detection is in the eye of the beholder, it is important to define the types of anomalies you want to detect. You can do this by narrowing in on the features that are most relevant to your target anomalies.

Use cases are helpful to make sure you can detect the various types of anomalies that might be present in your data set. Running your anomaly detection algorithm separately with different feature sets that have been selected for each use case can yield very interesting and explainable results.

For detecting network intrusions, anomaly detection can create a lot of noise by spotting abnormal events that are not cyber-attacks. Try using these approaches to fine tune your algorithm and present more meaningful data to the security analysts who respond to the results.

References

[1] V. Estivill-Castro, Why so many clustering algorithms: A position paper (2002), ACM SIGKDD Explorations Newsletter, 4(1)


Related Articles