The world’s leading publication for data science, AI, and ML professionals.

Converting deep learning research papers to useful code

If deep learning is a super power, then turning theories from a paper to usable code is a hyper power

If deep learning is a super power, then turning theories from a paper to usable code is a hyper power

Image by author
Image by author

Why should I learn to implement machine learning research papers?

As I’ve said, being able to convert a paper to code is definitely a hyper power, especially in a field like machine learning which is moving faster and faster each day.

Most research papers come from people within giant tech companies or universities who may be PhD holders or the ones who are working on the cutting edge technologies.

What else can be more cool than being able to reproduce the research done by these top notch professionals. Another thing to note is that the ones who can reproduce research papers as code is in huge demand.

Once you get the knack of implementing research papers, you will be in a state on par with these researchers.

These researchers too has acquired these skills through the practice of reading and implementing research papers.

How do I read and implement papers?

You might say, "Hm, I have a general understanding of the Deep Learning algorithms like fully connected networks, convolutional neural networks, recurrent neural networks, but the problem is that I would like to develop SOTA(state of the art) voice cloning AI but I know nothing about voice cloning 🙁 ".

Okay, here is your answer(some parts of my method is taken from Andrew Ng’s advice on reading papers).

If you want to learn about a specific topic:

  1. Collect 5–6 papers related to the specific topic(you can simply search arxiv or similar websites to get papers related to a topic).
  2. Don’t read a single paper completely, instead skim through all of the papers and pick a paper that interests you or if you had a specific paper in mind, go pick it up, no one can stop you.
  3. Read the abstract carefully and understand the idea from a high level and see whether your interest still persists, if so continue to skim through the images and see whether you can make assumptions on what the paper might be about.
  4. Now read the introduction carefully line by line because most of what the paper contains will be explained here in the most simplest manner with minimal math.
  5. If you wish, you can skip the math equations in the first pass, don’t skip the math if the Greek letters are familiar.
  6. At any situation, if you get stuck or some words are confusing, never hesitate to google it. No one is born as master of everything 😉
  7. After completing the first pass, you will be in a state where you understand the high level view of what the paper is trying prove or improve.
  8. In the second pass, try to understand almost everything in the paper and if you encounter any pseudo-code, try to convert it into your python library of choice(PyTorch, TensorFlow…)
  9. You can get more papers to read and get a better understanding of the field by going to the references section of each paper(same as connecting the dots).

💡 Some tips for effectively understanding a paper:

  • If you are a beginner to reading Research papers, it’s good to read some blog posts and videos related to that topic/research paper before reading the paper itself. This makes your job easier and you will not be discouraged by all that Greek letters.
  • Always take notes and highlight important points in the research paper for easier reference while implementing the code for the paper.
  • If you are new to implementing research papers and get stuck any where, it’s not a bad idea to go through open source implementations and see how others have done this.

Note: Don’t make the third point a regular practice because your learning curve will go down and you will over-fit 🙂

You should develop your own method for reading and implementing papers which can be made possible only by getting started, so the above steps will help you in getting started.

According to Andrew Ng, if you can read 5–10 papers on a topic(for eg: voice cloning), you will be in a good condition to implement a voice cloning system but if you can read 50–100 papers on that topic, you will be in a state to do research or develop cutting edge technology on the topic.

Let’s discuss a paper

  1. High level overview

Now, you have an understanding of how to read papers, let’s read and implement one for ourselves.

We will be going through Ian Goodfellow’s paper – Generative Adversarial Nets(GAN) and implementing the same with PyTorch.

A high level overview of the contents in the paper is clearly discussed in the abstract of the paper.

The abstract tells us that the researchers are proposing a new framework which contain two neural networks and they are called as generator and discriminator.

Don’t get confused by the name, they are just the names given to the two neural networks.

But the main point to note in the abstract section is that the above mentioned generator and discriminator compete against each other.

Okay, let me make this a little bit clear.

Let’s take the example of generating new human faces that doesn’t exist using GANs.

The generator generates new human faces with the same dimensions(H×W×C) as the real images and it shows it to the discriminator, the discriminator says whether the image is a fake image generated by the generator or a real image of human.

Now you may have a question, "Hm, how does this discriminator discriminate between real and fake images?", and here’s your answer:

The problem type of discriminator is image classification, that is, the discriminator will have to tell whether the image is fake or real(0 or 1). So we can train the discriminator as we used to train the dogs and cats classifier network but instead of convolution neural network we will use fully connected networks as this is what the paper proposed.

💡 DCGAN is another type of GAN which uses convolutional neural networks instead of fully connected networks and has better results.

So we train the discriminator in a way that we feed in an image and the discriminator outputs 0 or 1, that is, fake or real.

As we have our discriminator trained, we will pass the images generated by the generator to the discriminator and classifies whether it is real or fake. The generator adjusts all of its weights until it is able to fool the classifier into predicting that the image generated is real.

We will feed the generator with a random probability distribution(a random tensor), the duty of generator is to change this probability distribution to match the probability distribution of real images.

These are the steps that we should follow while implementing the code:

→Load the the data set containing real images.

→Create a random two dimensional tensor(probability distribution of fake data).

→Create a discriminator model and generator model.

→Train the discriminator on real and fake images

→Now feed the probability distribution of fake image to the generator and test whether the discriminator is able to identify the fake image generated by generator.

→Adjust the weights of generator (Stochastic Gradient Descent) until the discriminator fails to identify the fake images.

You may have several doubts, but that’s okay for now, once we implement the theory to code, you will get to know how it works.

  1. Loss function

Before we implement the code, we need a loss function so that we can optimize our generator network and a discriminator network.

The discriminator model has a binary classification problem, so we use binary cross entropy loss for discriminator or you can also use a custom loss function which is discussed in the paper.

Loss function from the paper: [log D(x)] + [log(1 − D(G(z)))]

x →real image

z →fake data or noise(random tensor)

D →discriminator model

G →generator model

G(z) →feeding fake data or noise to generator (output is a fake image)

D(x) →feeding real image to discriminator(output is 0 or 1)

D(G(z)) →fake data is fed to the generator, the generator outputs an image and the image is fed to discriminator for prediction(output is 0 or 1)

If you want to use the loss function from the paper, let me explain it for you:

According to the paper, for the discriminator, we need to maximize the above loss function.

Let’s take the first part of the equation:

D(x) outputs 0 or 1, so when we maximize log[D(x)], it makes the discriminator to output a value close to 1 when x (real image) is fed to it and that is what we need.

Now, let’s take the second part of the equation:

G(z) outputs an image with the same dimensions as the real image, now this fake image is fed to discriminator(D(G(z)), when we maximize this, the output of discriminator will be close to 1 and hence when we do [1 − D(G(z))], we will get a value close to zero and that is exactly what we need when fake image is passed to discriminator.

Note: You can add a negative sign to the equation and turn the loss function to a minimization problem for the discriminator which is easier than maximization.

For the generator, we need to minimize the above equation, but the paper considers only the second part of the equation [log(1 − D(G(z)))] for minimization.

— when we minimize D(G(z)), the discriminator outputs a value close to 0 and the overall output of the equation becomes close to 1, and that is what our generator wants to achieve, fool the discriminator to predict 1(real) when fake image from the generator is given.

Yay, let’s jump into the code!

I’ve done the code implementation in google colab, so it will be best if you try the code in google colab or jupyter notebook.

  1. Import the required libraries –
  2. We will be using a single image as real image for faster training and results, so the image generated by the generator will be similar to this image, you can also use a data set of images, it’s all up to you.

We will load the image as PIL image using the PIL library and we will resize and turn the image to a tensor using torchvision transforms and then create fake noise of size (1×100) for the image generation.

  1. Create a discriminator model which is nothing but a fully connected neural network which takes in a real image or fake image and outputs 0 or 1.
  2. Create a generator model which is also a fully connected network which takes in a random noise and outputs an image tensor which is of the same size as the real image.
  3. Initialize the models, optimizers and loss function, then move them to the desired device(cuda or cpu). We will be using binary cross entropy loss for the discriminator and the loss function discussed in the paper(log(1 − D(G(z)))) for the generator.
  4. Now we will train the model, the whole GAN is trained for five hundred epochs, the discriminator will be trained for four epochs and generator will be trained for three epochs for each epoch in the five hundred epochs.
  5. Compare the results of generated image with real image. You can tune the learning rate, momentum, number of epochs and layers of generator and discriminator model to get even better results.
    Comparing our generator's image with the real image(Image by author)
    Comparing our generator’s image with the real image(Image by author)

Final thoughts

The generated images may not be of very high resolution because this paper was just the beginning of the whole world of generative models. If your interest still persists, you can go ahead and read DCGANs(deep convolutional GANs) or any other paper from the world of GANs and implement it to see astonishing results but always remember that this paper is the foundation for all those papers.

Link to the complete code is available here.


Related Articles