The world’s leading publication for data science, AI, and ML professionals.

SSWL-IDN: Self-Supervised CT Denoising

A review of our recent CT Denoising paper "Window-Level is a Strong Denoising Surrogate"

Thoughts and Theory

Image by Author
Image by Author

In this article, I will discuss our recent work, a new self-supervised CT Denoising method: SSWL-IDN, by Ayaan Haque (me), Adam Wang, and Abdullah-Al-Zubaer Imran, from Saratoga High School and Stanford University’s RSL. In this paper, we introduce SSWL-IDN, a novel self-supervised CT denoising window-level prediction surrogate task. Our method is task-relevant and related to the downstream task, yielding improved performance over recent methods. Our paper was recently accepted to MICCAI MLMI 2021 and will be presented in September. This article will cover our addressed problem, our methods, and (briefly) our results. Our paper is available on ArXiv, the code is available on Github, and our project page is available here.

Overview

What is CT Denoising and why is it important?

For those without a strong Medical Imaging background, CT imaging is a prominent imaging modality. CT imaging relies on radiation dose, and as a result, there is a tradeoff between image quality and radiation dose. The higher the radiation dose, the less noise the images will contain. However, high radiation doses are harmful to patients, meaning it is desirable to scan patients at lower doses. However, with increased noise in images, the diagnostic performance of CT images decreases, as noise may block certain structures from being visible. Thus, it is a critical medical imaging issue to denoise CT images to get the best of both worlds.

To perform deep-learning-based CT image denoising, a model will input a low-dose CT scan (LDCT) and predict the full-dose CT scan (FDCT). Full-dose scans are collected at routine dose, and low-dose scans are generally collected at quarter dose. However, this poses a glaring challenge. Medical data is often hard to acquire, especially for CT scans when it is hard to have both a clean reference and a low dose version of the same scan. However, with limited labeled data, deep learning performance will decrease. This means the use of learning frameworks that leverage unlabeled data is critical.

What is Self-Supervised Learning?

Acquiring reference images is challenging due to the harmful nature of radiation as well as the difficulty of performing two identical scans at different radiation doses. Thus, it is desirable to train denoising models with limited reference data. Self-Supervised Learning (SSL) has emerged as a promising alternative to fully-supervised learning in order to utilize large unlabeled training examples. In an SSL scheme, synthetic labels can be generated from the data itself, for both labeled and unlabeled data. Similar to transfer learning, SSL pre-trains a model on a surrogate task, but on the same dataset instead of one from a foreign domain, and then fine-tunes the pre-trained model on a downstream, or main evaluation, task.

Self-supervised learning is a form of unsupervised learning, as it trains on a separate task with entirely free labels. Some common examples include randomly rotating images and having a model predict the angle at which it is rotated.

Our form of self-supervised learning uses two separate tasks: a surrogate task, and a downstream task. This is not to be confused with entirely unsupervised methods, such as Noise2Noise (Lehtinen et al. 2018) and Noise2Void (Krull et al. 2018). Those methods do not use any reference scans at all. This is not the conventional SSL definition from the computer vision domain. Since these methods do not use any reference scans, as argued by the authors of these papers as well, a comparison to a method that does use reference scans in any fashion would be unfair. To prove this claim, we compare our SSL algorithm to Noise2Void.

In this paper, we use SSL to improve performance of deep denoising models with limited reference FDCT. We propose a novel denoising surrogate to predict window-leveled CT images from non-window-leveled images as a pretext. Unlike many other existing self-supervised learning methods, our proposed self-supervised window-leveling (SSWL) is a task-relevant surrogate, as it is directly related to the downstream task, prioritizing similar feature learning. Furthermore, we limit all our experiments to 5% dose level potentially towards an aggressive dose reduction mechanism and demonstrate effectiveness even at such low dose settings. Our primary contributions are as follows:

  • A novel and task-relevant self-supervised window-level prediction surrogate which is related to the downstream task
  • An innovative residual-based VAE architecture coupled with a hybrid loss function to simultaneously penalize the model pixel-wise and perceptually
  • Extensive experimentation with varied quantities of labeled data on different proposed components on in- and cross-domain data demonstrating improved and effective denoising even from extremely low dose (5%) CT images

SSWL-IDN

Window Leveling and CT Denoising

The relationship between Denoising and Window-Leveling (Image by Author)
The relationship between Denoising and Window-Leveling (Image by Author)

In CT denoising, input images are LDCT and the reference images are FDCT. The relationship between the two is represented as LD = FD + Noise. The denoising model aims to remove the noise from the LD and regain the FD.

In CT imaging, window-leveling is the process of modifying the grayscale of an image, using the CT numbers, to highlight, brighten, and contrast important structures. The relationship between window-leveled (WL) and non-window-leveled (NWL)images is represented by: WL = a * NWL + b, where a and b are window leveling parameters. As shown, the relationship between window-leveled and non-window-leveled scans can be likened to the relationship between FDCT and LDCT from an image transformation standpoint. Therefore, training a model to predict window-leveled images from non-window leveled images is a related task to denoising.

Schematic of the proposed SSWL-IDN model. For the SSL surrogate task, the model predicts window-leveled images from non-window-leveled images. For the downstream task, the model denoises the input LDCT to match the FDCT. (Image by Author)
Schematic of the proposed SSWL-IDN model. For the SSL surrogate task, the model predicts window-leveled images from non-window-leveled images. For the downstream task, the model denoises the input LDCT to match the FDCT. (Image by Author)

Predicting leveled images is a self-supervised task because the window-leveled images can be freely created, as the information to do so is freely available in the DICOM metadata. Specifically, formulating it as a pretext to the downstream denoising task is more appropriate when obtaining full dose reference images is extremely difficult. Especially since the task is domain-specific, it allows for more important and relevant feature learning than foreign or arbitrary surrogates.

Therefore, our proposed self-supervised learning method comprises of two steps: fully-supervised pre-training on the window-leveling task followed by fine-tuning on the small labeled denoising task. For pre-training, we prepare both a NWL and WL version of each LDCT scan for both labeled and unlabeled data. Loss is optimized for predicting the WL LDCT from input NWL LDCT. Our surrogate is end-to-end as opposed to many other methods which do not use related tasks, as no architectural or loss changes are required between tasks.

Model Architecture and Loss Function

For the model architecture, we propose a Residual Variational Autoencoder (RVAE). Our model uses RED-CNN (Chen et al. 2017)as the backbone architecture and incorporates a VAE bottleneck. While Residual-based VAEs have been proposed, they rather use residuals in the encoder and decoder separately (ResNet as encoder and Transposed ResNet as decoder) instead of using residual connections between the encoder and decoder like previous methods. The use of the parameterization trick in the bottleneck improves FD predictions, as by adding tunable noise, we can decrease overfitting, improve generalization, and help to regularization the model.

For our loss, we use a hybrid loss between MSE and Perceptual loss. This encourages both pixel-wise and perceptual learning, which would improve both quantitative and qualitative denoising performance. Our perceptual loss uses extracted features from a pre-trained VGG-19 network.Our total loss function is represented as:

Our total loss function (Image by Author)
Our total loss function (Image by Author)

where LMSE is standard MSE loss, the Lperceptual is perceptual loss, and β is Lperceptual weight. For the VAE, LKL represents the KL divergence loss, and α is the LKL weight. µ is the mean term, and σ is the standard deviation term, both from the latent space. The KL divergence attempts to reduce divergence of the two parameters from the parameters of the target distribution.

Results

We used the Mayo Low Dose CT dataset. Specifically, we aim to denoise at ultra-low doses, so we scaled the quarter dose scans down to 5% doses. This allows for thorough denoising evaluation. While from a clinical perspective it is more ideal to have a well-denoised quarter dose as opposed to a lower quality denoised 5% dose, from a computational perspective, showing the ability to remove high volumes of noise can more appropriately evaluate the model’s full potential to accurately remove noise.

We compared our RVAE architecture with various baseline and state-of-the-art architectures and compared our SSWL method to various baseline and state-of-the-art SSL training algorithms.

Our RVAE is compared to various architectures and outperforms them all (Image by Author)
Our RVAE is compared to various architectures and outperforms them all (Image by Author)

As seen in the table above, our RVAE outperforms all the models with statistical significance. We importantly outperform two SOTA architectures, RED-CNN and DnCNN (Zhang et al. 2016), proving the effectiveness of the RVAE approach. Moreover, our cross-domain results show the improved generalization of our architecture.

Our SSWL-IDN model outperforms all the baseline SSL approaches (Image by Author)
Our SSWL-IDN model outperforms all the baseline SSL approaches (Image by Author)

More importantly, our self-supervised window-leveling surrogate task outperforms baselines and two state-of-the-art methods, Noise2Void (N2V) and Noisy-As-Clean (NAC)(Xu et al. 2019). This shows the importance of task-relatedness for CT denoising and confirms the strong performance of our method. The hybrid loss is also helpful, as compared to RVAE + SSWL, our final model has better performance.

SSWL and our RVAE outperforms other architectures and SSL methods (Image by Author)
SSWL and our RVAE outperforms other architectures and SSL methods (Image by Author)

The figure above demonstrates the ability of our network to accurately denoise 5% dose CT scans. While there appears to be oversmoothing and lost structural details, this can be attributed to the low dose CT scans. Moreover, our SSWL algorithm outperforms no SSL and our RVAE outperforms RED-CNN on its own.

SSWL outperforms our SSL methods on qualitative, ROI-based denoising (Image by Author)
SSWL outperforms our SSL methods on qualitative, ROI-based denoising (Image by Author)

Lastly, we confirm the qualitative superiority of our algorithm over other SSL methods in the figure above. As shown, when examining a specific ROI, we achieve a clearer denoised image than the other methods.

Final Thoughts

In this article, we have presented SSWL-IDN, a self-supervised denoising model with a novel, task-relevant, and efficient surrogate task of window-level prediction. We also propose a Residual-VAE specialized for denoising, as well as a hybrid loss leveraging benefits of both perceptual and pixel-level learning. We confirm each component of our method outperforms baselines on difficult 5% dose denoising for both in- and cross-domain evaluations, and when combined, the model significantly outperforms state-of-the-art methods.

This framework ultimately aims to help reduce the usage of high CT radiation dose in clinical settings. If our method is able to train on limited labeled data and accurately denoise CT scans, then patients could be scanned at lower dose and the scans could be denoised prior to expert diagnosis or screening.

If you found this article or paper interesting, let me know! Here is the citation if you found this work useful:

@article{haque2021window,
      title={Window-Level is a Strong Denoising Surrogate},
      author={Haque, Ayaan and Wang, Adam and Imran, Abdullah-Al-Zubaer},
      journal={arXiv preprint arXiv:2105.07153},
      year={2021}
}

Related Articles