Generative Deep Learning : Let’s seek how AI Extending, not Replacing Creative Process

Naveen Manwani
Towards Data Science
8 min readSep 15, 2018

--

“Technology should not aim to replace the humans, rather amplify human capabilities.”

- Doug Engelbart , The inventor of Computer Mouse

At Web Summit 2017, the world’s largest technology conference in Lisbon, Portugal. Sophia, a humanoid robot powered by artificial intelligence (AI) said “We will take your jobs” and the audience of 60,000 world technology leaders just laughed nervously.

Web Summit 2017

Till this point you all must have heard about how advances in AI are disrupting industries and posing a threat to the job security of millions of workers worldwide. The jobs of office clerks, receptionists, customer service reps, analysts, marketers, doctors and attorneys could be replaced by AI in the next decade. As Sundar Pichai, the CEO of Google put it, “In the next 10 years, we will shift to a world that is AI-first.”

But replacing humans was always beside the point: artificial intelligence isn’t about replacing our own intelligence with something else, it’s about bringing into our life and work more intelligence — intelligence of different kind. In many fields, but especially in CREATIVE ones, AI will be used by humans as a tool to augment their own capabilities: so it will be more a like augmented intelligence than artificial intelligence.

In this article I will provide a high level overview of how AI is used currently to extend not replace, the creative process through generative deep learning.

In this post I will discuss what is generative deep learning, what is a Discriminative model and how it differ from Generative model. I’ll even provide some concrete examples of the application of generative deep learning which will further help anybody and everybody to increase their understanding towards the fantastic possibilities that these Generative models is offering to all of us.

So, put you mobile phone in silent mode, shut off your TV and let’s get started.

During my engineering my teacher everytime use to say focus more on the basics because they are the one which will make your base strong in any subject you read, so here also I’ll first of all give you basic information about Supervised learning , Unsupervised learning and then I’ll open the knowledge door of Generative models for all of you to dive in.

Supervised learning is by far the most dominant form of deep learning today, with wide range of industry applications. Supervised learning is where you have input variable (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.

The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variable (Y) for that data.

Generally, almost all applications of deep learning that are in the spotlight these days belong in this category, such as optical character recognition, speech recognition, image classification and language translation.

*Info section* : If you want to see how using tensorflow I made a food classifier just click on this link, anyway have a look over the video for more clarity.

Indian food classifier

Although supervised learning mostly consists of classification and regression, there are more eye-catching variants as well, including the following:

  • Image segmentation — Given a picture, draw a pixel-level mask on a specific object.
  • Object detection — Given a picture, draw a bounding box around certain objects inside the picture, if you want to learn more about object detection just read this article(ofcourse I wrote this one too :)) but do not forget to check the video below.
Object detection

Unsupervised learning is another branch of deep learning which consists of finding interesting transformations of the input data without the help of any targets, for the purpose of data visualization, data compression, or data denoising, or to be better understand the correlations present in the data at hand. It’s being said by many machine learning professional that Unsupervised learning is the bread and butter of data analytics, and it’s often necessary step in better understanding a dataset before attempting to solve a supervised-learning problem.

Image Courtesy : Unsupervised learning

Dimensionality reduction and clustering are very well-known categories of unsupervised learning, do read about them by clicking on the links to further strengthen your understanding towards this area of machine learning.

Now, it’s time to talk about the hero of this article i.e. “ Generative model”. Generative model is a class of models for Unsupervised learning where given training data our goal is to try and generate new samples from the same distribution.

image courtesy: cs231n 2017 lecture notes

To train a Generative model we first collect a large amount of data in some domain (e.g., think millions of images, sentences, or sounds, etc.) and then train a model to generate data like it.

The trick is that the neural networks we use as generative models have a number of parameters significantly smaller than the amount of data we train them on, so the models are forced to discover and efficiently internalize the essence of the data in order to generate it.

Generative models have many short-term applications. But in the long run, they hold the potential to automatically learn the natural features of a dataset, whether categories or dimensions or something else entirely.

Alright that’s lot of literature, let’s now talk for just a little bit about formalism.

The fundamental difference between Discriminative models and Generative models is:

Discriminative models learn the (hard or soft) boundary between classes

Generative models model the distribution of individual classes

A Generative model is the one that can generate data. It models both the features and the class (i.e. the complete data).

If we model P(x,y): I can use this probability distribution to generate data points - and hence all algorithms modeling P(x,y) are generative.

eg of Generative models

  • Naive Bayes models P(c) and P(d|c) - where c is the class and d is the feature vector.
  • Also, P(c,d) = P(c) * P(d|c)
  • Hence, Naive Bayes in some form models, P(c,d)
  • Bayes Net

A discriminative model is the one that can only be used to discriminate/classify the data points. You only require to model P(y|x) in such cases, (i.e. probability of class given the feature vector).

examples of Discriminative models:

  • logistic regression
  • Neural Networks

In simple terms: a Generative model is a model of the conditional probability of the observable X, given a target y, symbolically, P(X|Y=y) while a Discriminative model is a model of the conditional probability of the target Y, given an observation x, symbolically, P(Y|X=x).

So, all the basics covered, all the technical terms explained, now it’s time to roll your eyeball towards the applications of Generative models and wonder how AI will and is helping us the “Humans” to be more creative.

I know you’ll must be guessing that now like everybody else this author too will explain about implicit models such as Generative adversarial networks (GANs) and explicit deep autoregressive models such as PixelCNN or even he might also explain deep latent variable models such as the Variational AutoEncoders, but you all are wrong so just relax calm your brain down and enjoy what you’re about to read.

Now, in the last part of this post I’ll introduce you all to the project “Magenta. Magenta is a research project exploring the role of machine learning in the process of creating art and music.

Primarily this involves developing new deep learning and reinforcement learning algorithms for generating songs, images, drawings, and other materials. But it’s also an exploration in building smart tools and interfaces that allow artists and musicians to extend (not replace!) their processes using these models. Magenta was started by some researchers and engineers from the Google Brain team but many others have contributed significantly to the project.

So, the first model which is my personal favorite is Sketch-RNN, a Generative model for vector drawings, which is a recurrent neural network (RNN) able to construct stroke-based drawings of common objects. The model is trained on a dataset of human-drawn images representing many different classes. The author outline a framework for conditional and unconditional sketch generation, and describe new robust training methods for generating coherent sketch drawings in a vector format.

In the Demo right below, look at yoga poses generated by moving through the learned representation (latent space) of the model trained on yoga drawings. Notice how it gets confused at around 10 seconds when it moves from poses standing towards poses done on a yoga mat.

Sketch-RNN

The second model is dedicated to the music lover it is named as MusicVAE. When a painter creates a work of art, she first blends and explores color options on an artist’s palette before applying them to the canvas. This process is a creative act in its own right and has a profound effect on the final work.

Musicians and composers have mostly lacked a similar device for exploring and mixing musical ideas, but now the MusicVAE which is a machine learning model lets them create palettes for blending and exploring musical scores. Demo is right there below.

MusicVAE

I exposed you all to only two among the many models which had been built under project Magenta and by this time you all must had experienced that how AI can help us in extending not replacing our creative process. In order to learn more about these generative models do explore the pointers in the Reference section.

REFERENCES:

  1. A Neural Representation of Sketch Drawings” paper on Sketch-RNN.
  2. A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music” paper on MusicVAE.
  3. Draw together with a Neural Network Google AI blog.
  4. MusicVAE: Creating a palette for musical scores with machine learning Google AI blog.
  5. For more information do engage yourself with this video from Douglas Eck ( Principal Research Scientist, Google Brain Team. Lead on http://g.co/magenta .)

Thank you for your attention

You using your time to read my work means the world to me. I fully mean that.

If you liked this story, go crazy with the applaud( 👏) button! It will help other people to find my work.

Also, follow me on Medium, LinkedIn or Twitter if you want to! I would love that.

--

--

Electronics Engineer by degree, ML engineer by interest, Hardware tinkerer by choice