Why You Should Consider Using Generative Adversarial Networks for Medical Data

Two Advantages in Using GANs for Medical Data

Sivan Biham
Towards Data Science

--

Photo by National Cancer Institute on Unsplash

Supervised deep learning methods require a large amount of data which might be hard to obtain, especially in the medical domain. One reason is that it is private data that we can’t always get, use and share. The second reason is the difficulty to obtain labels for this data. Expert annotators, such as doctors and nurses, are needed, which are both costly and time-consuming. Even if we have a large amount of labeled data, we might encounter other issues, such as data balancing. Rare conditions will not appear frequently in the data and will create a bias that will affect our models’ results.

So how can we overcome those issues? Is there a method that has the potential to help up? Generative Adversarial Networks.

One method that has this potential is Generative Models. In this post I will describe how using generative methods, specifically Generative Adversarial Networks, we can reduce the issues mentioned above. This post will cover the data volume, balance, and privacy issues.

Generative Adversarial Network (GAN)

GAN models generate new data with the same statistics as the training set. It consists of two networks, the generator and the discriminator which contest with each other. The generator learns to map from a latent space to the data distribution of interest in order to generate new samples. The discriminator's goal is to distinguish candidates produced by the generator from the true data distribution. In other words, to classify each sample whether it's a true sample or a generated one. The generator's goal is to fool the discriminator, make it believe that a generated sample is a real one, and maximize the discriminator’s error rate.

By using GANs we can generate new medical samples and tackle the issues of data volume, balance, and privacy.

Data Volume and Balance

In many situations, we want to develop an algorithm for medical purposes, but we don’t have enough data, and if we have it's not balanced. Both issues can be solved by generating quality synthetic data using GAN methods.

For example, DermGAN. DermGAN is a generative network that synthesizes clinical images with skin conditions. A map encoding of the skin condition, its region of presence (orange rectangle), and the skin color (red background) is passed through the generator to produce a synthetic image.

Image from the paper — https://arxiv.org/pdf/1911.08716.pdf

Using DermGAN we can generate a large number of images with skin conditions. This solves the data volume issue. We can also control the balancing of the images. We can generate more samples from rare conditions, which solves the data balancing issue.

And this is only one example of a work in this field.

Using generative methods can generate almost any amount of images, with the required balancing of the samples. This is a step towards overcoming the data volume and balancing issues.

Data Anonymization

Most medical research is done using real patients' private medical data. There are several open-source data sets, but they are limited to specific fields and specific data distribution, which is not enough for developing real-world systems. To overcome it, each group creates its own medical datasets for its own problem at hand. Those datasets are private and cannot be shared with colleagues outside the group. As such, knowledge sharing and results’ reproducibility become almost impossible.

By using GANs to generate synthetic medical data, we will be able to create new datasets and share them in the medical community and push the research even further.

Challenges

Although it sounds great, we need to into account several things before using synthetic data. For example -

  • The ratio between real and synthetic data
  • The sampling technique used to sample images from the GAN
  • The distribution of the synthesized data.
  • The quality of the synthesized data

It should be in our minds when we consider using synthetic data in our models.

Summarize

I hope I convinced you that the medical community can benefit from developing and improving GAN methods to generate quality synthetic medical data. I am not sure we have reached the point that we can use only synthesized data, especially not for all domains. But I think it is a good motivation to put the effort into developing GAN methods for medical data. I will keep investigating it on my domain, and keep you posted.

--

--