[ CVPR 2018 / Paper Summary ] Translating and Segmenting Multimodal Medical Volumes with Cycle- and Shape-Consistency Generative Adversarial Network

Jae Duk Seo
Towards Data Science
7 min readJul 10, 2018

--

GIF from this website

One of interesting papers from one of the most talented researcher.

Please note that this post is for my future self to look back and review the materials on this paper without reading it all over again.

Paper from this website

Abstract

Synthesized medical data for example CT images have a lot of difference use cases. It can be used to increase the number of training data, or it can also be used to provide Xray attenuation map for radiation therapy planning. The authors of this paper proposed cross-modality synthesis method that can achieve….
1) Create Synthesized 3D realistic images from unpaired data
2) Ensuring consistent anatomical structures
3) Improving volume segmentation by using synthetic data on limited training samples.

One very interesting fact was that all of the goals above were achievable via end to end model, that is composed of mutually beneficial generators and segmentors. The generator is trained via adversarial loss, a cycle-consistency loss, and also a shape consistency loss, which is supervised by segmentors. (To match goal 2). And from segmentor point of view the generator creates more training data for the model to learn on. And the authors show that it is beneficial to solve this task as a coupled network rather then trying to solve alone.

Introduction

Currently in clinical domain there is a huge need for developing a cross-modality image transfer analysis system to assist clinical treatment. In medical domain, it is hard to collect certain types of data, not only because of privacy issue of the patient, but also for the case when especially for a new imaging modality is not well established in clinical practice yet. This paper address both of those problems, first perform cross-modality image transfer and second boost the performance of segmentation by using generated images.

The authors used GAN’s to generate images, however, there are few problems. First the shape of the organ present in the images have to be preserved since they capture medical information, and second, lack of data to train a GAN from the start can cause a lot of problems. The authors of this paper proposed a general solution where, given two sets of unpaired data in two modalities, the network simultaneously learn generators for cross-domain volume-to-volume translation and stronger segmentors by taking advantage of synthetic data translated from another domain, in an end-to-end fashion. They used 4,496 cardiovascular 3D image in MRI and CT modalities

Related work

There are mainly two goals for medical image synthesis, first is to generate realistic images from different domain, and the second is to use the generate data from different domain to improve performance of classification tasks. And there have been quite a lot of work already done on this, however, learning from unpaired cross-domain data is not explored. Finally some other researches include where adversarial learning is used as an extra supervision on the segmentation or detection networks.

Proposed Method

In this section the authors first begin by explaining the image-to-image translation and then move on to volume-to-volume translation.

Image-to-Image Translation for Unpaired Data

Recently GAN’s have been used to perform image generations, such as style transfer or Conditional GAN. However, it is not always the case where paired data is obtainable for proper loss calculation. CycleGAN and other method have proposed to generalize Conditional GAN.

Problems in Unpaired Volume-to-Volume Translation

Lack of direct supervision are a huge problem, since it has intrinsic ambiguity with respect to geometric transformations. In short geometrical distortions can be can be recovered when it is translated back to the original domain without provoking any penalty in data fidelity cost.

Volume-to-Volume Cycle-consistency / Shape-consistency

First to train the GAN the authors used the cost function shown below.

Additionally, the authors of the paper introduces more loss functions including the Segmentation Network to preserve the shape of the data generated by the Cycle GAN. And the loss function below, also act as a regularization for GANs.

Multimodal Volume Segmentation / Objective

During training of GAN, a segmentation network is also trained and as the GAN creates more realistic data the segmentor takes advantage of those images.

The final loss function can be seen above, lambda and alpha are hyper-parameter to be adjusted and all of the other cost functions are explained above.

Network Architecture and Details / Training Details

The network actually uses 3D convolution operation on the volume data. Hence the training is much more difficult when compared to 2D convolution operated networks. Additionally, the authors used instance normalization. The generator network of this paper were heavily effected by U-Net, but with less aggressive down sampling operation. The discriminator were effected by PatchGAN and again for segmentation network U-Net type network was used. And as seen, above we can see that the author’s generator network does a better job at generating synthetic images when compared to pure cycle GAN.

For optimizers, the authors used Adam, and they pre-trained the generators as well as discriminators before segmentation network. Finally, the authors also decreased the learning rate at the latter stages of training.

Experimental Results

The authors used 4,354 contrasted cardiac CT scans from patients with various cardiovascular diseases as well as 142 cardiac MRI scans with a new compressed sensing scanning protocol. (Domain A was set to CT and B was set to MRI.)

Cross-domain Translation Evaluation

As seen above figure, the generated images do not show any geometric transformation and also looks very realistic. Additionally, the authors proposes a S-core to evaluate the shape in-variance quality of synthetic
images. Higher score means that there were less transformation in the given image, and we can see that the authors network outperformed the compared network.

Segmentation Evaluation

Here the authors have trained two segmentation network, and follow different set of approaches to measure the performance. The two different methods are ad-hoc approach as well as authors (end-to-end) approach. (as seen below)

And using the Dice Score, the authors have measure the difference performance of different approach. And we can observe that the authors method gives best performance.

Above is an example of segmentation mask and below are the plot of the performances.

Gap between synthetic and real data

Reducing the distribution GAP between the real and generated data is the key for this project. And as seen below, the authors methods have significant smaller gap when compared to the base line Dice Score where only real data were given.

Conclusion

In conclusion, the author of this paper have successfully shown a method on how to perform segmentation on medical images while performing cross-modality translation. Which is done by having a GAN that learns from unpaired data, keep the general structure, and a segmentation network that is able to take advantage of the generated synthetic data.

Final Words

The number of experiments done for just this one paper is extraordinary, no wonder Zizhao Zhang is one of the best.

If any errors are found, please email me at jae.duk.seo@gmail.com, if you wish to see the list of all of my writing please view my website here.

Meanwhile follow me on my twitter here, and visit my website, or my Youtube channel for more content. I also implemented Wide Residual Networks, please click here to view the blog post.

Reference

  1. Zhang, Z., Yang, L., & Zheng, Y. (2018). Translating and Segmenting Multimodal Medical Volumes with Cycle- and Shape-Consistency Generative Adversarial Network. Arxiv.org. Retrieved 10 July 2018, from https://arxiv.org/abs/1802.09655

--

--

Exploring the intersection of AI, deep learning, and art. Passionate about pushing the boundaries of multi-media production and beyond. #AIArt