Licensed from PikePicture — Adobe Stock

AI Generated Synthetic Media, aka deepfakes

Advancements in artificial intelligence (AI) and cloud computing have enabled the sophistication of audio, video, and image manipulation techniques with ease and speed.

Ashish Jaiman
Towards Data Science
7 min readAug 9, 2020

--

The Book is now available at Amazon — https://www.amazon.com/Deepfakes-aka-Synthetic-Media-Humanity-ebook/dp/B0B846YCNJ/

Introduction

Imagine a few days before an election, a video of a candidate is released, showing them using hate speech, racial slurs, and epithets that undercut their image as pro minorities. Imagine a teenager watching embarrassingly an explicit video of themselves going viral on social media. Imagine a CEO on the road to raise money when an audio clip stating her fears and anxieties about the product is sent to the investors, ruining her chances of success.

All the above scenarios are fake, made up, and not actual, but can be made real by AI-generated synthetic media, also called deepfakes[1]. The same technology that can enable a mother, losing her voice to Lou Gehrig’s disease to talk to her family using a synthetic voice can also be used to generate a political candidate’s fake speech to damage their reputation. The same technology that can give a teacher the ability to engage effectively with her students using synthetic videos can also be used to create a fake video of a teenager to damage her reputation.

Advancements in artificial intelligence (AI) and cloud computing technologies, GPU VMs (Graphic Processing Units Virtual Machines) and Platform Services, have led to rapid development in the sophistication of audio, video, and image manipulation techniques. Access to commodity cloud computing, public research AI algorithms, and abundant data with availability of diverse and vast media have created a perfect storm to democratize the creation of synthetic media. This AI-Generated synthetic media is referred to as deepfakes. The distribution of synthetic media is democratized at scale by social platform.

Deepfakes are synthetic media (fake) generated using the Artificial Intelligence technique of deep (deep) learning.

Innovation and research in GAN techniques combined with the growing availability of computing have led to improvements in the quality of synthetic data at a surprising pace. New tools, many of which are publicly available, can manipulate media in increasingly believable ways, such as creating a copy of a public person’s voice or superimposing one person’s face on another person’s body. GAN and deepfakes have evolved from research and academic topics to practical applications for businesses to innovate, entertain, and engage in social engagement.

Cheapfakes are simple manipulation through easy conventional editing techniques like speeding, slowing, and cutting, as well as nontechnical manipulations like restaging or recontextualizing existing media. One Example of a cheapfake is the “Drunk Pelosi” video[2]. Lately, we have seen recoloring and retouching used in a few political advertisements, which can also type of cheapfakes[3].

Cheapfakes or Shallowfakes are defined as manipulated media created by using more straightforward image and video editing techniques to spread mis/disinformation or to change the narrative of a story.

Types of Deepfakes

Deepfakes have become synonymous with face swapping and lip-syncing. There are many other types of AI-based manipulation of audio, video, and images that can be qualified as deepfakes.

Face-swapping

Face swapping is when one person’s face is replaced or reconstructed by another person’s face or key features from another face. Face swapping or manipulation with filters is a common feature of almost all the social media, video chatting apps. Since 2014, the social media app, Snapchat, had filters to enhance faces. Using the face-detecting lens technology, you can make yourself old, add beauty filters, or give yourself cat ears and whiskers. The output of these apps and technology will qualify as an AI-Generated synthetic media or deepfakes. A vast number of free and paid apps and online tools make it super simple to face swap of two individuals. Developers can use open source code from Faceswap and DeepFaceLab from GitHub to create very sophisticated deepfakes with some efforts to customize code and training AI models.

Puppeteering

Puppeteering is rendering manipulated full-body actions and behavior using AI. It is a technique to create a 3D model of the target face and body in a video to act and say as the puppeteer. It is also known as full body deepfakes. In August 2018, UC Berkeley presented a paper called Everybody Dance Now[4]. It was research on how AI can transfer professional dancer’s moves onto the bodies of amateurs. Data Grid, a Japanese artificial intelligence company, created an AI engine that automatically generates virtual models for advertising and fashion.

Lip-sync

Lip Synching is a technique to render mouth movements and facial expressions to make the target say things with their voice and the right tone and pitch. AI algorithms can take an existing video of a person talking and alter the lip movements in the video to match new audio. The audio may be an older speech taken out of context, an impersonator speaking or synthesized speech. Actor and director Jordan Peele used this technique to create a viral video of Obama.

Voice Cloning

Voice Coning is a deep-learning algorithm that takes in the voice recordings of an individual to generate synthetic voice that is overly like the original voice. It is a technique to create custom voice font of an individual and then use the font to generate speech. There are numerous apps and cloud services to develop synthetic voice, Microsoft Custom Voice, Lyrebird AI, iSpeech, and VOCALiD, which gives individuals and businesses access to such technology to improve their agency.

Image Synthesis

Image Generation or Image Synthesis is a technique to use computer vision technology, deep Learning and Generative Adversarial Networks (GANs) to synthesize new images. It can produce a computer-generated image of a person or any object that is not real. A team at NVIDIA trained a computer with pictures of human faces pulled from Flickr to create the website, ThisPersonDoesnotExist.com. There are other examples at the site, ThisXDoesnotExist.com.

Text Generation

Text Generation is a method to automatically generate text, write stories, prose, and poem, create abstracts of long documents, and synthesize using AI techniques for text and deep learning. Using RNN (recurrent neural networks) and now GANs, there are many practical use cases of text generation. Text Generation can help in the new automated journalism or robot journalism effort in the industry. OpenAI’s GPT-3 can generate any text, including guitar tabs or computer code.

Positive Use

Technology is very empowering and a great enabler. Technology can give people a voice, purpose, and ability to make an impact at scale and with speed. New ideas and capabilities for empowerment have emerged because of the advancements in data science and Artificial intelligence. AI-Generated Synthetic media has many positive use cases. Technology can create possibilities and opportunities for all people, regardless of who they are and how they listen, speak, or communicate. The deepfake technological advances have clear benefits in certain areas, such as accessibility, education, film production, criminal forensics, and artistic expression.

More on Positive Use Cases of Deepfakes

Malicious Use

As with any new technology, evil actors will take advantage of the innovation and use it for their benefit. GAN and Deepfakes have become more than research topics or engineering toys. Starting as an innovative research concept, now they can be used as a communication weapon. Deepfakes are becoming easy to create and even easier to distribute in policy and legislative vacuum.

Deepfakes makes it possible to fabricate media — swap faces, lip-syncing, and puppeteer — mostly, without consent and bring threat to psychology security, political stability, and business disruption. Deepfakes can be used to damage reputations, fabricate evidence, defraud the public, and undermine trust in democratic institutions. In the last two years, the potential for malicious use of synthetic data created using generative AI models has begun to cause alarm. The technology has now advanced to potentially be weaponized to perpetrate damage and inflict harm to individuals, societies, institutions, and democracies. Deepfakes could not only cause harm but also will further erode already declining trust in the media. It can also help public figures hide their immoral acts in the veil of deepfakes and fake news, calling their actual harmful actions false, also known as liar’s dividend.

Deepfakes can contribute to factual relativism and enables authoritarian leaders to thrive.

Deepfakes can be used by non-state actors, such as insurgent groups and terrorist organizations, to represent their adversaries as making inflammatory speeches or engaging in provocative actions to stir up anti-state sentiments among people. For instance, a terrorist organization can easily create a deepfake video showing soldiers dishonoring a religious place to flame existing anti-state emotions and cause further discord. States can use similar tactics to spread computational propaganda against a minority community or another country, for instance, a fake video showing a police officer shouting anti-religious slurs or a political activist calling for violence.

All this can be achieved with fewer resources, internet scale and speed, and even microtargeted to galvanize support.

More on Malicious uses of Deepfakes

Countermeasures

To defend the truth and secure freedom of expression, we need a multi-stakeholder and multimodal approach. The main objective of any countermeasure to mitigate the negative societal impact of malicious deepfake must be two-fold. One, to reduce the exposure to malicious deepfakes and second, to minimize the damage it can inflict.

Effective countermeasures for malicious deepfakes fall into four broad categories of legislative action and regulations, platform policies and governance, technological intervention, and medial literacy.

Technical Countermeasures to Deepfakes

Media Literacy is an Effective Countermeasure

[I will explore Policy and Regulation in a future article]

--

--

thoughts on #AI, #cybersecurity, #techdiplomacy sprinkled with opinions, social commentary, innovation, and purpose https://www.linkedin.com/in/ashishjaiman