Deepfakes: The Ugly, and The Good

Itā€™s our job to ensure the technologies we develop are leveraged for good.

Joey Mach
Towards Data Science

--

Yesterday, my friend made me play this game, it was called Which Face is Real. Yes, I know this sounds lame, but I literally made the wrong guess every single time šŸ˜.

Which face is real? Well, I actually donā€™t know. ā“

This made me realize something. I realized that humanity is simply a really dumb species, I donā€™t mean it in an offensive way, well itā€™s offensive either way (sorry!) šŸ˜….

ā€œThe first thing we should assume is we are very dumb. We can definitely make things smarter than ourselves.ā€ ā€” Elon Musk šŸ˜¶

Itā€™s not a question of will artificial intelligence be smarter than human beings, it already is in several areas, but the real question is how we can leverage this intelligence for good?

The advancement of artificial intelligence has taken an exponential curve for the past few years.

Thatā€™s right, AI is disrupting almost every single field. šŸ’Ŗ

Merely 10 years ago, things like Siri and Alexa didnā€™t even exist.

Siri vs Alexa Rap Battle, thatā€™s how good AI has become! šŸ˜‚

Today, we are able to leverage AI to detect cancer from medical images, Google Assistant can book appointments for you over the phone by mimicking human-voice, and developing fake images that are almost flawlessly similar to real images has never been easier before.

The widespread concerns regarding privacy and misinformation have shunned the spot-light on deepfakes, which are fake media produced using algorithms such as autoencoders and Generative Adversarial Networks.

In the hands of the wrong person, this technology can be used for fraud. For instance, recently, a deepfaked voice was used to scam the CEO of a UK firm $244,000 šŸ¤Æ.

The Technology behind Deepfakes: Generative Adversarial Models

The emergence of Generative Adversarial Networks (GANs) disrupted the development of fake images.

Previously, weā€™ve been using manual methods like photoshop, but with Generative Adversarial Networks, this process is being automated and the results are generally significantly better.

Generative Adversarial Networks is a relatively new neural network first introduced in 2014 by Ian Goodfellow. The objective of GANs is to produce fake images that are as realistic as possible.

There are two components involved in GAN:

  1. The Generator ā€” generates the images
  2. The Discriminator ā€” classifies whether the image generated is a fake image or a real image

The generator takes a latent sample, a vector of random noise in as input. By leveraging de-convolutional layers, which is essentially the reverse of convolutional layers, an image is produced.

De-convolutional layers

Convolutional layers are responsible for extracting features from an input, de-convolutional layers perform the reverse as it takes the features in as input, and produces an image as the output.

Letā€™s say youā€™re playing charades with friends. The person whoā€™s acting the phrase is performing ā€œde-convolutionsā€, while the other players guessing the phrase is performing ā€œconvolutionsā€. The ā€œactingā€ is analogous to the image, while the ā€œguessingā€ is analogous to the features of the image.

Charades ā€” reminds me of the fun times

In essence, the convolutional layers identify the features of the image, and the de-convolutional layers construct an image given its features.

Convolutional layers

The discriminator, on the other hand, leverages convolutional layers for image classification as its job is to predict whether the image produced from the generator is real (1) or fake (0). The objective of the generator is to produce images that are as realistic as possible and successfully fool the discriminator into thinking that the generated images are real.

The discriminatorā€™s job is critical to the success of the generator. To help the generator produce more realistic images, the discriminator has to be really good at differentiating real images from fake images. The better the discriminator is, the better the generator will be.

After every iteration, the generatorā€™s learnable parameters, the weights, and biases are refined according to the suggestions given by the discriminator.

The network updates the learnable parameters by backpropagating through the gradients of the discriminatorā€™s output in regards to the generated image. Essentially, the discriminator tells the generator how it should tweak each pixel so that the image can be more realistic.

Letā€™s say the generator created an image and the discriminator thinks the image has a 0.29 (29%) probability of being a real image. The generatorā€™s job is to update its learnable parameters so the probability increases to, for instance, 30% after computing backpropagation.

The generator, however, relies on the discriminator to be successful.

A more intuitive way of thinking of this is to imagine how the generator is analogous to the student, and the discriminator to the teacher.

Does this spark any memories from grade school? šŸ“

Remember the time when you wrote a test, and you were so stressed out about your results because you have no idea how well youā€™ve done.

The student doesnā€™t know the mistakes theyā€™ve made until the teachers mark their test and give them feedback. The studentsā€™ job is to learn from the feedback the teacher gives them to improve their test scores.

The better the teacher is at teaching, the better the studentā€™s academic performance gets.

Theoretically, the better the generator is at creating almost flawlessly realistic images, the better the discriminator gets at differentiating real images from fake images. The same is true for the reverse, the better the discriminator is at image classification, the better the generator will be.

Leveraging MobileNetV2 to detect Fake Images

To classify fake and real images, I used a pre-trained convolutional neural network known as MobileNetV2, which is trained on ImageNet, a 14 million image dataset.

The main reason why I leveraged MobileNetV2, the 2.0 version of MobileNet instead of constructing my own convolutional neural network is that MobileNetV2 is designed to be mobile-friendly which means that it requires significantly less computational power.

By adding several additional layers on top of MobileNetV2 and training it off the Real vs Fake images dataset, my model was able to differentiate fake images from real images with a 75% accuracy.

The structure of my neural network looks something like this:

1. MobileNetV2

2. Average Pooling Layer ā€” This layer is responsible for reducing the dimensions of the data by performing down-sampling. For instance, letā€™s say we have a 4x4 input matrix, and a 2x2 average pooling layer is applied. Essentially what happens is that for each 2x2 block in the input matrix, the average of those values is taken, thus reduces the dimensions.

Average pooling layer

3. Dense Layer ā€” A dense layer is just a regular fully connected layer where every node in each layer is connected to every node in the next layer. The activation function, ReLU ( R(z) = max (0,z) )is applied to this layer and is essentially responsible for restricting the range to only positive values.

ReLU function

4. Batch Normalization Layer ā€” This layer is responsible for normalizing the inputs passed through that layer for the purpose of enhancing the stability of the neural network.

5. Dropout Layer ā€” This layer is responsible for preventing overfitting by setting the activations of certain random nodes to 0. In the dropout layer, there are no learnable parameters (ie. this isnā€™t a training layer like the other layers).

6. The Output Layer ā€” This is essentially a regular dense layer with the softmax activation function applied. The softmax function produces an output between 0ā€“1, which represents the probability distribution of the final predicted output classes.

Softmax function

The accuracy of this model can be easily enhanced by having a larger dataset. I trained the model on a dataset of 2041 images, but imagine if we trained it on 1 million images, the accuracy of the model can be significantly higher!

Applications of GANs

The applications of GANs arenā€™t just restricted to negative purposes like creating misinformation, it can actually be applied to so many industries in a positive way!

Some pretty interesting uses of GANs:

  • Generating art šŸ˜ ā€” Yup, who said machines canā€™t do creative things, well I guess it can!
  • Producing music šŸŽµ ā€” People actually enjoy AI-music!
  • Creating fake images of humans
  • Generating fake anime images ā€” Iā€™m not a huge fan of anime, but if you are, well awesome!
Fake images generated by GANs

More practical applications: šŸš€

By leveraging GANs we can potentially accelerate advancements in machine learning itself!

We can leverage GANs to tackle the issue around a small dataset, one of the major bottlenecks in machine learning by generating new data that are similar to the real data in the dataset.

Having a larger dataset is one of the most reliable ways to enhance the performance of classification-based machine learning programs.

šŸ–¼ļø One of the potential applications might be to leverage GANs to generate synthetic brain-MRI images to improve the detection of glioma, a type of brain cancer.

šŸ’Š The most exciting application is leveraging GANs to synthesize new molecules or drug candidates that target a specific protein or biomarker which causes disease.

To do this, the generator is responsible for generating the new molecule, and then the discriminator predicts the role of the molecules to see if it can either inhibit or excite a certain protein. GANs can we used to advance and accelerate the process of drug discovery.

šŸŽ„ In the film and comic industry, GANs can be used for text to image synthesis.

There are so many more practical applications of GANs! There is potential in this technology to disrupt many industries, ranging from healthcare to film! Itā€™s up to us to ensure that this technology is applied for the right purposes and complement humanity.

Key Takeaways:

  • GANs is a technology used to create synthetic data and involves 2 components: the generator, and the discriminator.
  • By leveraging convolutional neural networks, my machine learning program was able to detect fake images generated by GANs with a 75% accuracy.
  • The applications of GANs can be used for both good and bad. While GANs can create fake media, however, it can also be used to synthesize new molecules for drug discovery.

Donā€™t forget to:

  • Connect with me on LinkedIn
  • Visit my website to take a sneak peak into my full portfolio!

--

--