14 Deep and Machine Learning Uses that made 2019 a new AI Age.

BigGAN, CycleGAN, StyleGAN, GauGAN, Artbreeder, DeOldify, etc.: My very personal list (incl. BONUS).

TTime flows rapidly than we expect. Good for Moore’s law. But still, a lot to catch up. In the following, I want to present my list of great stuff that was happening in 2019 (and — sorry for cheating — some for 2018 as well) in the field of Machine Learning and Deep Learning. Those are mostly Neural Network-based models that impressed me.

For the beginning, here is a tweet by Ian Goodfellow demonstrating perfectly the achievements of Deep Learning.

Even if it’s about a specific topic: Generative Adversarial Networks progress, this development shows pretty well what has happened — and what will come. A picture is worth a thousand words.

And I have a feeling, 2019 was way intense than years before. Here are developments which brought me to a delighted astonishment in a no particular order.

DISCLAIMER 1: I’m new to Towards Data Science. So in case, I missed some crucial discussions here on the topics below, feel free to complement this article with referring links.
DISCLAIMER 2: This is a wild mix of models, networks, implementations, and experiments. They are often interconnected. Some of them are cores of another. Some of them are already well covered by authors of Toward Data Science. What you see here are DL/ML stories impressed me within the last 1–2 years.
DISCLAIMER 3: As you will note, the AI advancements in my list refer mostly to the visual medium (image generating and modification). The year 2019 was was also a crucial milestone for Natural Language Processing (GPT-2 by OpenAI etc.), but that’s another big topic I will cover soon).

1. BigGAN

What’s this about: BigGAN scales up Generative Adversarial Networks — and allows you to generate new visuals, being trained on huge visual databases. The heart of the system is two Neural Networks: Generator and Discriminator. The Generator creates new visuals and tries to convince Discriminator it was real footage. Discriminator aligns the produced images with its “experience” and sends it back to Generator as “not approved”. This iterative interplay continues till some kind of “consensus”.

Try it out: Using this BigGAN notebook, you can use category-conditional sampling and create e.g., an image of a valley:

And the results are astonishing:

BigGAN generated images

In the header of my article, you can also see the BigGAN-generated clock.

As you see, it’s still Weak AI. Networks don’t know what Clock is. They just know, how may this thing look like: “roundish”, “with characters and arrows”.

I see clear parallels between attempts of AI to interpret the world and the Theory of Forms/Ideas by Plato:

Ideas or Forms were meta-physical essences of the material things. The material things weren’t originals, but just imitations of the Ideas/Forms.

My Coverage of this topic (Friend-Link):

2. Metamorphosis with BigGAN.

What’s this about: We can go further than just generating labeled images. We can merge and metamorph things with BigGAN using Interpolation Function. In the case of BigGAN, a transformation of generated image A to generated image B is possible, however semantically alien they were.

Try it out: Use the same BigGAN notebook with the Interpolation function.

Using these settings, you can transform a Yorkshire terrier to a Space Shuttle

This method opens possibilities, never seen before — beyond human imagination. You even can produce more gradual frames — and combine them into animated footage (check out Colab Notebook by Zaid Alyafeai for more)

My coverage of this topic:

3. Style Transfer

What’s this about: StyleGAN allows another kind of image modification: Style Transfer: a style of image A is transferred to image B.

Try it out: Colab Notebook. There are also various free and commercial DL-based apps converting your images into artworks of world art masters.

I style-transferred my Userpic with various artist’ styles with convincing results:

Van Gogh
Magritte
Kurt Schwitters

You are probably familiar with Style Transfer, since Towards Data Science provides some great articles about this topic (here are just some of them, read all of them):

Even a Video Style Transfer is possible:

Artist Gene Cogan used Style Transfer to Disney’s Alice in Wonderland (that tea party scene) — and transferred styles of 17 famous artworks to the animation:

My coverage of this topic (Friend-Link):

Recently, StyleGAN2 entered the stage — with improved quality of images (GitHub / Paper / Video).

StyleGAN 2 teaser (source)

Also new ways of work with image are possible. For example, StyleGAN projection: alignment with target image from any possible image. I highly recommend to follow Jonathan Fly with his comprehensive experiments to every newest DL/ML development:

Also Roadrunner01 is highly interesting to follow:

(AI Pioneers, Artists and Experimenters is another topic I will cover 2020)

4. Creative use of Style Transfer: Deep Painterly Harmonization

What’s this about: Some artists and developers use the abilities of Style Transfer for creative image manipulation. The idea is as simple as genius:

1. take a target image B
2. transfer its style to the element A you want to build into B
3. combine and enjoy

This method allows, for example an artistic use for style transfer in digital image collages.

Paper: by Fujun Luan et al. In their GitHub repository, you can find amazing samples:

Gene Cogan applied Style Transfer in a selfie-way, building himself everywhere into the world art history:

My coverage of this topic (Friend-Link):

5. Comixify — Converting a Video back to a Story Board

What’s this about: Group of Researchers of the Warsaw University of Technology, both fascinated by AI and comics art, combined their passions to an amazing implementation (read for more details):

1. The model analyzes the video, using intelligent video summarization.
2. The scenes in the video footage are separated by the DL-powered definition of frames with the most aesthetic impact.
3. Style Transfer for the specific stylization of images is applied.
4. Selected frames are put into storyboard / comic layout

Try it out: Comixify — select a video and transfer it to comics.

I am a huge fan of Tarkovsky’s movies, and so I was eager to see, what will happen with this supercut of “Stalker”:

And the results were stunning:

Especially if you know — and love — the movie, you will see how amazingly the frames are chosen. It’s actually depicting the core idea of Stalker (without spoiling the film).

My coverage of this topic with more experiments (Friend-Link):

6. CycleGAN — image-to-image translation without input-output pairs.

What’s this about: while BigGAN generates new images on the pre-trained basis, and StyleGAN transfers styles between two images, CycleGAN uses the single image to transfer its style or features into something different. It’s actually, an Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (that’s precisely the name of the paper):

1. The image is being analyzed by GAN (including pattern and object detection)
2. Pre-trained feature modification is applied.
3. The same image as in "1." has new visuals achieved by "2."

CycleGAN changes the styles and visual features of an image without references to another images.

Papers: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (Main Page / GitHub / Paper)

It can not only transfer a pre-trained style of an artist to a photo. It can modify a painting to a photo-realistic image, using pre-trained knowledge about segmentation features:

Source: Paper Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

You even can CycleGAN video footage. In this case the transfer “horse=>zebra” is applied on a moving picture:

(Souce)

Not without some systematic flaws though…

Putin zebrafied (Souce)

There are many relevant articles at Towards Data Science, I won’t take deep dive into technical details. The most important thing for me is the fact, that with DL, the modifiability of images became a new advanced level. Suitable for artistic use. Dangerous because of DeepFake misuse.

My coverage of this topic (Friend-Link):

7. StyleGAN trained on paintings

What’s this about: _C0D32_ over at reddit trained StyleGAN on 24k Artwork Dataset from kaggle. With his modified code, new artworks in various styles are produced. Here’s how it works:

StyleGAN generates original artworks with pre-trained artistic styles.

Try it out: Google Colab Notebook.

Interesting thing: even if you get countless unique artworks with this model, with some knowledge of art history you can suppose, which styles, art movements or even artists are shimmering through the new images.

Two generated images versus “Las Meninas”, by Diego Velázquez

Or mimesis of pointillist art.

Generated image versus Georges Seurat (right image)

My coverage of this topic, including more images and analysis (Friend-Link):

8. pix2pix: Image to image translation.

What’s about this: pix2pix was developed by Phillip Isola et al., and gone viral at the latest 2017. The image translation, done by Conditional Adversarial Networks, allowed to render human-made doodles to photo-realistic (or somehow in this kind) images. Watch this famous Two Minutes Paper by Karoly Zsolnai-Feher for summary:

Papers: on the Pix2Pix website you can find documents, papers, examples, and demonstration of this method (GitHub / Paper)

Try it out: Christopher Hesse provided the live TensorFlow demonstration for pix2pix. You can “translate” your doodles of cats, facades, and other things into “photo-realistic” images.

pix2pix is trying the best from my amateurish doodling of a cat.

It was surely way more than just funny sketch translation: with pre-defined settings, you can transform an aerial photo to a map, a daylight photo to a night view etc. Conditional Adversarial Networks detect the patterns and translate them to the demanded topics (you have to define your target image task). Networks are trained on specific labeled image datasets.

NVidia brought this method to another level with GauGAN — one of the experiments in their AI Playground. You can drive sketches using segmentation: every color is applied to a specific object or material. After translation new image is generated (with CycleGAN-alike possibility to switch between various visual features):

Image, doodled by my daughter for our AI-driven fairy tale book.

9. pix2pix, face2face, DeepFake and Ctrl+Shift+Face

Deep Learning world is full of experimentation. People think out of the box and that’s the most inspiring thing about DL specifically and AI in general. Gene Cogan experimented with dynamic pix2pix: in this case, the source was not a sketch, but webcam (his face) and the target was trained on Trump photos:

These experiments inspired researches (in this case: Dat Tran) to face2face:

The principle is smart:

1. face2face model learns facial features / landmarks
2. It scans webcam input on facial features
3. It finally translates it into another face

Another frontier of the post-truth epoch is reached — now we can modify not only images, but also moving pictures. Like AR applications on popular messaging apps, AI interprets the video footage and modifies it, in perfect ways.

Artists like Ctrl+Shift+Face perfected this method to an unbelievable level: he switched playful faces of actors in cult movies with the help of face2face. Like here: “Shining” with Jim Carrey. Side by side comparison shows you how well can even the fine psychological play be translated.

Such implementation bear manifold possibilities within:

Filmmakers can experiment with actors before the audition. They can also localize movies for better lip-synchronization in various languages, as Synthesia did with David Beckham.

Now imagine these possibilities for international video conferencing using AI-driven language translation and speech synthesis.

Artists can produce subversive and surreal “Being John Malkovich” alike masterpieces. For example this inspiring, bright and slightly absurd cover of “Imagine”, being sung by Trump, Putin, et al.:

Passed away persons can be revived. The best example is singer Hibari Misora, who preformed new song on annual Japanese New Year TV event NHK Kōhaku Uta Gassen (NHK紅白歌合戦). She did it this week, even if she died 30 years ago. The visuals were reconstructed with the help of AI, the voice was simulated by Vocaloid:

But the new ways for DeepFake are open. Remember ZAO, the Chinese DeepFake fun app: transfer yours and celebrities face. Now you are Leonardo DiCaprio. Combine it with biometric payment rolling out in China, and you have endless possibilities for fraud:

My coverage of this topic (Friend-Link):

10. 3D Ken Burns Effect.

What’s this about: this model, developed by Simon Niklaus converts single images into tracking shots. The model recognizes the backgrounds, simulates depth, fills the missing areas with content-sensitive inpainting, adds new angles — short: with a single image, you can produce a spatial 3D video footage. Read more about it in the blog by Andy Baio.

Try it out: Colab Notebook by Manu Romero.

Here are some of my experiments (in twitter thread):

I wrote about this topic (Friend-Link):

11. ArtBreeder: unlimited artwork generation

What’s this about: Joel Simon implemented BigGAN and other models into user-friendly web application ArtBreeder. You have many diverse possibilities to create and modify faces, landscapes, universal images etc. Artbreeder is growing and developing meanwhile in a vivid community, where users and developer are in a continuous dialogue.

Try it out: https://artbreeder.com/

Here are some of the samples I did with ArtBreeder:

12. DeOldify — colorization of Black&White photos

What’s this about: DeOldify is created and released by Jason Antic. The mission of this project is to colorize and restore old images and film footage. (source). DeOldify is using Generative Adversarial Networks with the iterative interplay between two Neural Networks Generator and Discriminator (like in ArtBreeder). But differently to the last model, the images in DeOldify aren’t modified or generated in their form. Power of GAN brings colors — Generator applies pigments to the recognized objects he’s trained on, and Discriminator tries to criticize the color choice.

Try it out: You can find the model in GitHub, and also in two notebooks — for Images (Colab Notebook) and Videos (Colab Notebook)!

Here is my take: a b&w photo my father did back to 1960ies. Look how detailed the colors of flowers are detected.

Surely, the colors don’t repeat the original palette. But it vitalizes the historical photos, making them way near to our times.

I wrote on this topic here (Friend-Link):

13. Virtual Reality powered by AI

What’s this about: AI-powered Virtual Reality is possible. Actually, this is an one year old news by NVidia — and very promising one:

Here the city and visuals are trained on Google StreetView, so the VR city experience is reconstructed by a Deep Learning model:

For training, the team used NVIDIA Tesla V100 GPUs on a DGX-1 with the cuDNN-accelerated PyTorch deep learning framework, and thousands of videos from the Cityscapes, and Apolloscapes datasets (source).

You can imagine all the potentials of this approach: realistic city simulations “from scratch”, assistance for urban development, traffic management and logistics, reshaping video game landscapes.

14. Runway ML

What’s this about: Runway is an ultimate application using varieties of ML/DL models for diverse needs. It can translate image2text, generate text after images (using GPT-2), detect objects in a photo and video footage. You also can combine various models into chain reactions. And it’s free.

Try it out: https://runwayml.com/

I will write about RunwayML soon.

BONUS: Resources to try it out

AI Winter is (hopefully) finally over. The technology is here, we are connected, ideas exchange is buoyant as never before. And the best thing about AI Renaissance is: popularization and democratization of DL/ML. Nowaday not only Python speakers and NVidia GPU owners can enjoy the sheer endless possibilities: everybody can do it. Writers, Artists, people of other non-tech defined fields can use Colab/Jupyter Notebooks, user friendly applications like Artbreeder and RunwayML etc.

Here is just a small list for you to bookmark: cross-platform web based AI experiments. Now there is no excuse to claiming AI would be a rocket science (it still is in various high-tech sectors, but it is not in general). This list will be updated.

You also can follow my Twitter List of AI relevant key persons. The list is growing — and that’s the great thing!

These are just some of amazing AI developments 2019. Do you have another AI-driven WOWs? Feel free to write them into comments!

Towards Data Science

A Medium publication sharing concepts, ideas, and codes.

Vlad Alex (Merzmensch)

Written by

Vladimir Alexeev. Futurist. AI-driven Dadaist. Living in Germany, loving Japan, AI, mysteries, books, and stuff. https://www.linkedin.com/in/v-alexeev/

Towards Data Science

A Medium publication sharing concepts, ideas, and codes.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade