AI and Next-Gen Security Tech

UEBA, NGAV and XDR

Published in

Towards Data Science

6 min readDec 30, 2020

Artificial intelligence (AI) techniques are changing the way organizations defend themselves against security threats. In this article, I’ll discuss three primary security technologies — user and event behavioral analytics (UEBA), next-generation antivirus (NGAV), and eXtended Detection and Response (XDR) — and review the machine learning algorithms and techniques they employ.

AI in Cybersecurity

AI software is based on machine learning (ML) algorithms, which are responsible for sophisticated automated and autonomous capabilities. Machine learning applications in cybersecurity vary, depending on the tasks performed. However, there are certain key machine learning tasks that help make cybersecurity operations more efficient:

Regression — helps identify correlations between various datasets. You can use regression techniques to predict certain activities, compare them to actual activities, and then detect security anomalies.
Classification — algorithms are trained using previous observations and then generate predictions. You can leverage classification to categorize binary files into types of cybersecurity threats, such as spyware, ransomware, and adware, as well as legitimate software, to differentiate between the types.
Clustering — a technique implemented using unsupervised algorithms, which use common features to group artifacts. You can use clustering to identify distributed denial of service (DDoS) attacks. In this case, the clustering algorithm analyzes traffic sessions for the purpose of groups of sessions that might originate from the same source.

How AI is Used in UEBA

User and entity behavior analytics (UEBA) is a security technology that detects threats using machine learning analysis. It analyzes data streams from user accounts and entities (devices, applications, networks, etc) in an IT environment, and tries to identify outliers that might represent a security incident. It is especially useful in detecting insider threats — these are invisible to traditional security tools, because they are activities performed by authorized users.

UEBA tools can identify anomalous behavior even in the absence of known patterns. These methods differ from traditional static analysis methods in that they are heuristic: they calculate a risk score, which is the probability that the event represents an abnormal or safe event. When the risk score exceeds a certain threshold, the system generates a security alert.

UEBA uses a variety of AI/ML techniques, including:

Supervised machine learning — a set of known good and known bad behaviors is fed into the algorithm, which analyzes new behavior and learns if it is “similar” to a set of known good behaviors or known bad behaviors.
Bayesian networks — supervised machine learning and rules are combined to create behavioral profiles for each group of users or type of entity.
Unsupervised learning — these algorithms train on normal behavior, and can detect and warn when they see abnormal behavior (whether good or bad).
Reinforcement learning — hybrid, semi-supervised models, in which real real alert resolutions are fed back to the system to fine tune the model and reduce signal-to-noise ratio.
Deep learning — enables classification and investigation of alerts. The system trains on security alerts and human classification results (for example, false positives, normal incident, severe incident), identifies relevant features, and uses these features to predict classification results for new security alert sets.

How AI is used in Next-Generation Antivirus (NGAV)

Next generation antivirus (NGAV) tools were intended to overcome the shortcomings of legacy antivirus (AV), which was based on file-based signatures and heuristics.

Legacy AV relied on previous knowledge of malware characteristics and behavior to identify new infections. Attackers quickly learned to bypass these signature-based detection methods by creating polymorphic malware, or by constantly modifying malware properties to shorten the life of these malware signatures and render them useless.

NGAV detects new, unknown threats without using a complex set of rules or a signature database. Instead, it looks for similarities between the activity of system processes and known malware criteria, to identify suspicious behavior.

NGAV relies on the following machine learning techniques to identify unknown malware:

Static features — extracted from software code without executing it. These features are obtained from binary content of executables, or assembly language source files, and compared to features of known malware.
String analysis — involves extracting all printable strings from an executable or program. A string search is the easiest way to get clues about the functioning of a program. The information contained in these strings includes, for example, the URL to which the program is linked, the file location or file path of the files accessed/modified by the program, and the name of the application menu.
N-grams analysis — sequences of bytes, or N-grams, can be extracted from binary sources of suspected malware. Each unique combination of a specific sequence of bytes is an N-gram, and can be compared to N-grams extracted from known malicious binaries.
API commands — most software programs perform application programming interface (API) calls, and these calls are unique to its operation. It is possible to model the behavior of a program using its API function calls.
Entropy — attackers tend to compress or encrypt malware code in order to avoid detection. Entropy analysis, which looks at a statistical variance of code sequences, is helpful in this regard. Encrypted or obfuscated code tends to have higher entropy than cleartext code.
Visualizing binary content — the binary code of a suspected malware can be visualized as a grayscale image. This is done by interpreting each byte in the file as a pixel in the image, and reconstructing it as a two-dimensional matrix. The image can then be analyzed using a variety of machine learning and computer vision techniques.
Control flow graph (CFG) — CFS is a directional graph that represents how software operates, representing activities like loops and calls to other code segments. A control flow diagram can also be used to identify malware, by comparing the CFG of suspected malware to known malware. The algorithm must be able to disassemble the binary file and analyze code or API calls to construct a sparse graph, convert it to vector, and then run it through a classifier.

Use of AI in eXtended Detection and Response (XDR)

For UEBA and NGAV solutions to perform behavioral analysis effectively, businesses must have a robust and integrated dataset for machine learning analysis. This requires collecting and combining data from across the organization’s IT environment.

Both UEBA and NGAV have been integrated into a new threat detection and response solution called XDR. XDR operates across multiple layers of the security environment, including networks, endpoints (such as servers or personal computers), and cloud systems.

XDR can intelligently combine data from these different data sources and compile a unified “attack story”, by:

Collecting telemetry from IT systems and combining it with relevant threat intelligence data (such as threat feeds and databases).
Performing in-depth analysis of data from different security tools, and enabling dynamic querying of this data in a multidimensional space.
Feeding data to a data lake, and enabling real-time queries and training models based on raw unstructured data, using both structured and unstructured ML algorithms.
Using classifiers to predict the most appropriate response measure for a threat, and automatically activating the appropriate security defenses — this enables XDR to actually stop security breaches as they happen, not merely to detect or prevent them.
Combine AI insights with human intelligence, allowing analysts to add their insights, and training on these insights on an ongoing basis to improve predictions.

Conclusion

In this article I covered the use of AI/ML techniques in three types of security tools: UEBA, NGAV, and XDR. These tools, and many others, are enabling security teams to automatically analyze large quantities of security data and identify threats, even if they were previously unknown, with much higher accuracy than that of traditional statistical methods.

With a dramatic shortage of staff in the modern security operations center (SOC), AI can make a dramatic impact on the ability of security teams to review data about security incidents, identify real incidents, and respond effectively to prevent data breaches.