The world’s leading publication for data science, AI, and ML professionals.

Synthetic Data Generation Using Conditional-GAN

Modifying the GAN architecture to get more control over the type of data generated

Photo by Siti Rahmanah Mat Daud on Unsplash
Photo by Siti Rahmanah Mat Daud on Unsplash

Why there is a need to Generate Data?

In the world of information technology, companies use data to improve the customer experience and provide better services to their customers. Sometimes, the collection of data can be tedious and costly.

In this article, we will discuss GANs and specially Conditional Gan, a method we used for synthetic data generation at Y-Data. and how they can be used to generate synthetic datasets from them.

GAN (Generative Adversarial Network)

GAN was proposed by Ian Goodfellow et al.¹ in 2014 in this paper. The GAN architecture consists of two components called Generator and Discriminator. In simple words, the role of the generator is to generate new data (numbers, images, etc.) which is as close/similar to the dataset that is provided as input, and the role of the discriminator is to differ between generated data and real input data.

Let’s visit the algorithmic working of GAN in detail:

o The generator takes a vector of random numbers as input and returns the image generated by it.

Generator Input (source: here)
Generator Input (source: here)

o The image generated from the generator along with the sample of real images is passed as input to the discriminator.

Discriminator Input (source: here)
Discriminator Input (source: here)

o The discriminator takes samples of both types, images from real dataset and the samples of generated images. It returns a probability value between 0 and 1, where the value closer to 1 represents that more change of image belonging to the real dataset, otherwise there is more chance of image belonging from generated sample of images.

o The misclassification of the discriminator can be penalized when we calculate the discriminator loss. This discriminator loss then backpropagates and updates the discriminator weights, which in turn improves discriminator prediction.

GAN discriminator training (source: here)
GAN discriminator training (source: here)

o The generator then calculates the generator loss with the help of discriminator classification and backpropagates it through both discriminator and generator to calculate the gradient. It then updates only generator weights with those gradients.

GAN generator training (source: here)
GAN generator training (source: here)

Conditional-GAN

Although GAN was able to generate some good examples of data points, it is not able to generate the data point with the target label and the dataset generated from it lacks diversity.

Conditional GAN was proposed by M. Mirza² in late 2014. He modified the architecture by adding the label y as a parameter to the input of the generator and try to generate the corresponding data point. It also adds labels to the discriminator input to distinguish real data better.

Below is the architecture of Conditional GAN:

C-GAN architecture
C-GAN architecture

In this architecture, the random input noise Z is combined with the label Y in the joint hidden representation, and the GAN training framework allows for a lot of flexibility in how it receives input.

The data points X and Y are passed into the input of the discriminator along with generative output G(z), which as in the vanilla GAN architecture

The loss function for conditional GAN is similar to the GAN:

Conditional GAN loss function
Conditional GAN loss function

Python Implementation

In this implementation, we will be applying the conditional GAN on the Fashion-MNIST³ dataset to generate images of different clothes. This dataset contains 70,000 (60k training and 10k test) images of size (28,28) in a grayscale format having pixel values b/w 1 and 255.

There are 10 categorical labels in the dataset. Each of them is represented by numbers b/w 0–9. Following is a table for corresponding labels:

0: T-shirt/top

1: Trouser

2: Pullover

3: Dress

4: Coat

5: Sandal

6: Shirt

7: Sneaker

8: Bag

9: Ankle boot.

Now, let’s start working on the code. First, we import the required modules, we will TensorFlow Keras API to design our architectures. Some of the code is taken from this book⁴:

Now, we define the generator architecture, this generator architecture takes a combination of random latent vector and label as input and generates an image of that label.

Generator architecture (source: code)
Generator architecture (source: code)

Now, we define the discriminator architecture, this discriminator architecture takes an image and label as input and outputs the probabilities of it being real or generated.

Discriminator architecture (source: code)
Discriminator architecture (source: code)

In the next step, we will combine these two architecture to form a complete C-GAN architecture:

Now, it’s time to train the model, first, we define the sampling function and after that, we train the model.

2000 [D loss: 0.097897, acc.: 96.88%] [G loss: 4.837958]
4000 [D loss: 0.084203, acc.: 98.05%] [G loss: 5.004930]
6000 [D loss: 0.111222, acc.: 97.66%] [G loss: 4.664765]
8000 [D loss: 0.091828, acc.: 97.27%] [G loss: 5.158591]
10000 [D loss: 0.110758, acc.: 98.44%] [G loss: 5.750035]
12000 [D loss: 0.152362, acc.: 92.58%] [G loss: 4.261237]
14000 [D loss: 0.075084, acc.: 96.48%] [G loss: 5.566125]
16000 [D loss: 0.108527, acc.: 96.88%] [G loss: 9.546798]
18000 [D loss: 0.043660, acc.: 99.61%] [G loss: 6.730018]
20000 [D loss: 0.075157, acc.: 97.66%] [G loss: 6.238955]

Now, we generate the GIF of the generated images, some of his code has been taken from here.

Generated_Images (source: code)
Generated_Images (source: code)

Conclusion

In this article, we discussed GAN and Conditional-GAN for the purpose of synthetic data generation. While GAN was a revelation in the field of synthetic data generation. Different researchers have modified the idea behind GAN as their needs. One such idea is Conditional-GAN which allows us to generate the data as per the required label.


References

[1]: Ian J. Goodfellow, J. Pouget-Abadie, M. Mirza, and Others, Generative Adversarial Networks (2014).

[2]: M. Mirza, S. Osindero, Conditional Generative Adversarial Nets (2014).

[3]: H. Xiao, K. Rasul, R. Vollgraf, Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms.

[4]: J. Langr, V.Bok, GANs in Action: Deep learning with Generative Adversarial Networks (2019), Manning Publications

Portions of this page are modifications based on work created and shared by Google and used according to terms described in the Creative Commons 4.0 Attribution License.


Related Articles