The world’s leading publication for data science, AI, and ML professionals.

CFA Paper Review: Coupled-hypersphere-based Feature Adaptation for Target-Oriented Anomaly…

A novel anomaly localization approach that produces the features adapted to the target dataset and employs transfer learning

This article is in continuation of the stories Paper Review: Reconstruction by inpainting for visual anomaly detection and GANomaly Paper Review: Semi-Supervised Anomaly Detection via Adversarial Training. In the previous posts, I covered reconstruction-based approaches that identify anomalies in images. Models, such as autoencoder and generative adversarial networks, can be used to detect anomalies in images. How do they work?

They encode and reconstruct only normal images during training. In evaluation, there is the hypothesis that these models shouldn’t reconstruct well anomalous images since they weren’t fed during training and, then, the defect images should have a higher anomaly score than defect-free images. However, this task is challenging with complex datasets and sometimes these types of approaches can produce good reconstruction results for abnormal images, leading to failure on distinguishing between abnormal and normal images.

For this reason, I researched other types of Anomaly Detection methods and I came across a novel approach, called Coupled-hypersphere-based Feature Adaptation (CFA). There are two main ideas behind this approach:

  • It makes use of pre-trained CNN to extract the features of patches
  • It adopts transfer learning to adapt the feature to the target dataset and, consequently, abnormal features can be clearly distinguished from normal features.

In this post, I am going to review the paper that introduced this novel anomaly detection model.

Outline

  1. Requirements
  2. Overview of CFA
  3. Experiment Settings
  4. Quantitative Results
  5. Qualitative Results

1. Requirements

When you read a paper, there are always some concepts that are taken for granted and are necessary to understand deeply the work. I recommend you to take a look in this section if you don’t know one of these terms:

  • SPADE
  • Transfer Learning
  • Visual anomaly detection

SPADE

Semantic Pyramid Anomaly Detection (SPADE) is an anomaly detection approach that uses pre-trained CNNs, such as ResNet-18 and Wide ResNet-50, to extract meaningful features [2]. Differently from CFA, this approach exploits a pre-trained CNN on ImageNet without learning the target dataset, which can have a completely different distribution from the dataset that was fed to the pre-trained CNN.

SPADE consists of 3 different phases to solve the task of anomaly detection:

  1. The pre-trained CNN extracts features from the target dataset.
  2. The second phase uses KNN to retrieve K nearest normal images from the training set for each test image. The distance is calculated using the Euclidian metric between the extracted features representing normality from the training dataset and the extracted features of the test image.
  3. The third phase finds dense pixel-level correspondence between the target and the normal images. If target image regions don’t present near matches with the normal images, retrieved in the second phase, are labeled as anomalous.

Transfer Learning

Transfer Learning is a deep learning research area focused on applying previously acquired knowledge in one domain to solve a different but related task. For example, you can use a pre-trained CNN, such as ResNet which was previously trained on ImageNet, to classify images into the categories of cats and dogs.

Visual Anomaly Detection

Visual anomaly detection is an important problem in the field of machine learning that combines computer vision and anomaly detection [3]. It can be further grouped into two different categories:

  • image-level anomaly detection only tries to understand if the whole image is anomalous or normal.
  • pixel-level anomaly detection locates the abnormal regions within the image. For this reason, it’s often called anomaly localization.

2. Overview of CFA

Coupled-hypersphere-based Feature adaptation (CFA) is an anomaly localization approach that combines feature extractors with transfer learning. Indeed, it exploits the principles of transfer learning to create more robust and generalizing features that permit to determine of whether the input image of the target dataset is anomalous or not.

Previous works that used only pre-trained CNNs without transfer learning, such as SPADE, Padim, and PatchCore, with large datasets, like ImageNet, achieved very good performances. However, it can be challenging when the target dataset is completely different from ImageNet and the produced features in the middle layers are consequently biased. There are other two main contributions to this approach:

  • A novel loss function is proposed, based on soft-boundary regression, that searches a hypersphere with a minimum radius to cluster normal features. Thus, it allows the patch descriptor to extract discriminative features, and, then, abnormal features can be clearly distinguished from normal ones.
  • The scalable memory bank is compressed independently of the size of the target dataset. It provides three benefits: alleviating the risk of overestimated normality of abnormal features and achieving the efficiency of spatial complexity.

Coupled-Hypersphere-based Feature Adaption

The bias problem of the pre-trained CNN is solved by combining a hypersphere-based loss function and a memory bank. It tries to extract clustered features φ(pt) using the K-means algorithm since normal features are important for distinguishing abnormal features.

where K is the number of nearest neighbors matching with the target-oriented features and D is the distance metric. Thus, CFA allows feature adaptation by optimizing the parameters of the target-oriented features φ(pt) to minimize the loss L_{att} through transfer learning.

To avoid the overestimation of the normality of abnormal features, an additional loss is defined. To address this problem, hard negative features are used to perform contrastive supervision, leading to a more discriminant φ(pt). Hard negative features are defined as the K+j-th nearest neighbor ct^j of pt. The loss L_{rep} supervises φ() contrastively such that the hypersphere created with ct^j as the center repels pt.

where J is the total number of hard negative features to be used for contrastive supervision and alpha is the hyperparameter that controls the balance between these two losses, L{att} and L{rep}. As result, these two losses are combined into a unique loss:

Memory Bank Compression

The goal is to construct an efficient memory bank. First, an initial memory bank C0 is built by applying K-means clustering to all the features, obtained from the first normal samples x0 of the training set X. After, there are the following steps to update the memory bank:

  • Infer the i-th normal sample and search for the set of the nearest patch features from the previous memory bank C_{i-1}
  • the i-th memory bank of the next state Ci is calculated by the exponential moving average (EMA) of Ci^{NN} and C_{i-1}

The final memory bank C is obtained by repeating the above process |X| times for all normal samples of the training set.

We can notice that the space complexity is reduced compared to the other approaches based on feature extractors without transfer learning. In particular, it is not dependent on the size of the target dataset |X|.

Scoring Function

The anomaly score is defined using the minimum distance between the target-oriented features φ(pt) and the memorized features.

However, the boundaries between clusters of normal features are not clear and it’s hard to distinguish abnormal features with the naive anomaly score precisely. For this reason, a novel scoring function is proposed to consider the certainty of φ(pt). The more φ(pt) is matched, the shorter the distance to a specific memorized feature is compared to other memorized features. Softmin is used to measure how close the nearest c is compared to the other c.

During the evaluation of the novel approach CFA, we can obtain the anomaly score map, which constitutes the final output for anomaly localization.

3. Experiment Settings

There are two datasets considered as benchmarks to evaluate the novel approach: MVTec AD and RD-MVTec datasets. While the MVTec AD is a novel and comprehensive industrial dataset with 5354 high-resolution images divided into 15 categories, the RD-MVTec is just a copy of the MVTec AD with unaligned samples. The images of the RD-MVTec are rotated randomly within +=10 degrees. After this transformation, the samples are resized to 256×256 and randomly cropped to 224×224.

The performance is evaluated using Area Under the Receiver Operator Curve (AU-ROC) as metric. The image-level AUROC is used to evaluate the performance of the model for anomaly detection, while pixel-level AUCROC for the performance for anomaly localization.

The experiments were performed using all pre-trained CNNs on ImageNet, where feature maps are extracted from intermediate layers {C2,C3,C4} of each pre-trained CNN. A 1×1 CoordConv is considered as the Patch descriptor, which is trained for 30 epochs.

4. Quantitative Results

Table 1 and Table 2 show respectively the performance of different anomaly localization methods on the MVTec AD dataset and RD-MVTec AD dataset.

  • CFA++ presents lightly lower pixel-level AUROC scores than CFLOW when considering all the classes together on the MVTec AD dataset. But it should be noticed that it obtains good performance with a memory bank, that has smaller spatial complexity.
  • On RD-MVTec AD dataset, the performances of the anomaly localization approaches are lower that the performances of the MVTec AD dataset. In particular, SPADE seems the approach more sensible to the rotation of the images, which decreases drastically its AUROC scores.

In Table 3, the performance of CFA++ is much more class by visualizing the image-level ROCAUC score per class on the MVTec AD dataset. It’s worth noticing that CFA++ outperforms all the other approaches when looking at the performance at the class level due to the effect of the feature adaptation to the target dataset, while CFLOW has lower performances than CFA++ when dealing with a class at a time.

Table 4 allows comparing the performance of anomaly detection/localization combined with the pre-trained CNN. CFA++ obtains the highest performances when it uses EffiNet-B5 and ResNet18 as feature extractors.

5. Qualitative Results

The qualitative results are important to interpret if the features produced by CFA enable distinguishing normal images from abnormal images. The following figure shows the anomaly score of patch features of two examples, the bottle with an easy defect to identify and the cable with a more challenging anomaly.

In the anomaly score, the red color represents the anomaly score. This visualization wants to highlight the differences when transfer learning is applied or not on the extracted features:

  • When a feature is obtained without transfer learning, the normality of the normal feature is underestimated and has a score similar to that of the abnormal feature. Then, it’s hard to identify the differences between the two features since the boundary based on the anomaly score is ambiguous and not so clear (second column – Biased).
  • When target-oriented features are obtained after using transfer learning, they are well clustered, as you can see from the third column of Figure 1. However, clustering alone is not enough to precisely score the uncertain abnormal features of the hard case. The scoring function proposed in the paper calculates the anomaly score by taking into account the certainty. In such a way, we are able to distinguish the abnormal features from the normal ones in the hard case.

Below there are the anomaly localization results that show the abnormal areas identified by CFA.

Takeaways

I hope you appreciated this review of CFA. As I am never tired to repeat, anomaly detection is a challenging problem to solve and there is a need of having an overview of these methods to understand which one is the most appropriate in that particular context.

In the last article, I suggested you take a look at papers that explain Skip-GANomaly and AnoGAN. In this post, I advise you to read the papers regarding SPADE, PaDiM, CFLOW, and FastFlow. All these approaches have in common that they exploit the pre-trained CNNs to detect and localize anomalies. Let me know if you have other suggestions about readings, sharing knowledge is the best way to improve. Thanks for reading. Have a nice day!

References:

[1] CFA: Coupled-hypersphere-based Feature Adaptation for Target-Oriented Anomaly Localization, S. Lee, S. Lee and B. Cheol Song, (2022)

[2] Sub Image Anomaly Detection with Deep Pyramid Correspondences, N. Cohen and Y. Hoshen, (2021)

[3] Visual Anomaly Detection for Images: A Systematic Survey, J. Yang, R. Xu, Z. Qi and Y. Shi, (2022)

GitHub Repository

GitHub – sungwool/CFA_for_anomaly_localization


Did you like my article? Become a member and get unlimited access to new Data Science posts every day! It’s an indirect way of supporting me without any extra cost to you. If you are already a member, subscribe to get emails whenever I publish new data science and python guides!


Related Articles