Bot Ross: Teaching a Computer to Paint

Simon Carryer
Towards Data Science
10 min readSep 22, 2019

--

I am not, I will admit, a terribly good artist. Not with a brush and paint at least. I can make a credible effort with a pen or pencil, provided you’re not after something tricky like human hands or horses. But break out the oil paints and it’s all over — the best I can produce is some indecipherable blobs in garish colours. Despite, or perhaps because of this complete lack of ability with a brush, I find celebrity TV painter Bob Ross completely enthralling. His gentle delivery and effortless manner are a soothing balm in a hectic world. One of the more charming aspects of Ross’s delivery is his assurance that, given enough practice, anyone could learn to paint.

“Talent is a pursued interest. Anything that you’re willing to practice, you can do.” — Bob Ross

It’s a hugely appealing idea, and it’s a sentiment I tend to share. I’m a firm believer in hard work and practice over some nebulous concept of “talent”. And Ross seems so at peace while painting, so lost in the tiny bucolic Alaskan wilderness he’s creating on the canvas. I want to explore that wilderness too. But I’m a busy man. I have a lot of work to do, and a lot of essays to write. Can I take a shortcut? Bob Ross says if I’m willing to practice, I can paint just like him. But can I get an algorithm to do that practice for me? Can I teach a computer to paint?

This is a tricky problem for artificial intelligence. Can we structure the problem in such a way as to make it solvable for a computer? What does it mean to teach a computer to paint? What I want to achieve is a system where I can, with minimal input, produce a novel painting with minimal effort. I want to short-circuit the process of practice. What I want is something that can turn my garish blobs into something resembling a Bob Ross painting, without the bother of having to learn to do it myself.

I call it “Toothpaste Mountain”

Artificial intelligence algorithms work, for the most part, on the principle of prediction. Given an input, they predict an output that is most likely to meet some desired criteria, to minimise error. In this case, we want an algorithm that, given one of my blobby creations, can predict (and display) what the painting would look like, had it been painted by Bob Ross himself. It needs to translate, effectively, between my semi-incompetent input, and a Bob-Ross-alike output.

We’re getting closer to turning this into a solvable problem. What we need is a set of training data — a set of blobby paintings paired with a real Bob Ross equivalent. To produce that, we’ll need to go through a painstaking process of reverse engineering — taking completed Bob Ross paintings, and creating a blobby equivalent for each of them. We’ll take all of the elements of a Bob Ross painting — majestic mountains, quiet rivers, cosy cabins, sweeping clouds, and happy little trees — and render each of them as simple blobs of colour in vaguely the right shape. I call the style “Blob Ross”.

(The original is on the left)

With the help of a good friend, I was able to produce just over 250 Blob Ross paintings, paired with the Bob Ross originals, a small but hopefully sufficient training dataset. This is the data that the computer will learn from, inferring that, for example, a patch of royal blue should be rendered as some dramatic Alaskan mountains, while a strip of cyan should become a winding river.

Now we just need an algorithm that can learn to solve this problem. The neural networks I introduced in my last couple of essays are not going to cut it in this case. As we’ve learned, these algorithms (like most machine learning algorithms), work on the principle of minimising error. They produce a result that will be the least wrong, across the whole of the training set, on average. But that’s a problem. Let me explain why.

So majestic

Here are some mountains from a few Bob Ross paintings. While they’re all recognisably in his style, they’re also quite dramatically different — they are different colours, they have different patterns of light and shade, and they’re very different shapes. A traditional algorithm attempting to predict this would try to split the difference — it would try to find the colour and shape exactly in the middle of all Ross’s mountains. Most likely it would produce a khaki smudge. But there’s a better way.

Most of the techniques and algorithms used in artificial intelligence are, relatively speaking, quite old. Neural networks, like the one I used to identify medieval pole arms, were (arguably) first described in the 1940’s. Many of the other techniques used in modern machine learning are even older. But the technique we’re going to use in this essay is a genuinely recent invention: the “Generative Adversarial Network” or “GAN”.

The GAN was first proposed only a handful of years ago, in 2014. GANs are responsible for some of the most exciting, and the most terrifying recent innovations in computer-generated images, including “deepfake” technologies that can seamlessly paste a new face onto an actor’s (often unclothed) body.

While GANs build on the concept of the neural network, they also introduce an extremely interesting innovation. Instead of just one neural network, the GAN employs two, working in a kind of competition with each other.

The first network is a “generator”. For given input data — in our case our Blob Ross paintings — it produces a predicted image — what it thinks Bob Ross would make from that input. I call it “Bot Ross”. Like other neural networks, the generator starts off making its choices at random, and learns to make better choices over time, by getting feedback on its success. But what makes it different is where it gets that feedback from.

The second network is a “discriminator”, and it is the source of the feedback to the generator network. The discriminator is given the generator’s predicted image, and the real Bob Ross painting, and it is asked to guess which image is which. In other words, it learns to tell the difference between the “fake” Bob Ross paintings produced by the generator, and the real thing.

Where the discriminator correctly detects the fakes, that is given as negative feedback to the generator, and it adjusts its process until it can more successfully fool the discriminator. By training in this way, the generator learns to avoid the “khaki smudge” problem. While its first attempts are very blurry, it quickly learns that the discriminator can detect this easily. It gets more cunning, producing sharper images, attempting to mimic the shapes seen in real Bob Ross paintings. The result is that, after sufficient time training, the generator can produce a somewhat-convincing facsimile of a Bob Ross painting.

Let’s see what that looks like in practice.

Above is an input image (Blob Ross), the real painting it was based on (Bob Ross), and the generator’s first attempt at predicting from that input (Bot Ross). The generator has never seen the real Bob Ross painting for this example — it has to generalise from the other paintings it’s seen. As expected, we have khaki smudges everywhere. There’s no defining detail, and by taking the average colour for each area, we’ve ended up with a blurry middle ground. But just like the original Bob Ross says, with practice, it can do better! Over time, with feedback from the discriminator, Bot Ross learns to make a better job of it.

The above images are taken from progressively more advanced versions of the model, as it trains. You can see the generator is learning to add more detail. The trunks of the trees get some shading, the sky takes on more texture, and the pine trees start to develop their characteristic jagged edges. But we also see some other interesting developments. There are weird lines and repeating blotches in the images. These are artefacts of the generator — it struggles to invent details for large areas of empty space, and tends to fall back on repeating the same patterns.

Eventually, after a few hundred rounds of training, the generator learns to avoid the worst of these artefacts, and its results become, if not indistinguishable to a human eye, at least much more credible attempts.

It performs notably better at some areas of the painting than others. The signature pine trees, which Ross produces with a few deft brush strokes on nearly every one of his paintings, are reproduced with fine detail — the blobs have been given jagged pine-needle edges, and sharp points. Because the pine trees appear in a lot of the training data, and look more-or-less the same every time, Bot Ross is able to reproduce them quite well. The painting’s sky is a pleasing blur of clouds and sunshine — because there’s not much detail here, Bot Ross has learned that he can fill the sky with whatever vague smudges he likes, and the discriminator will be none the wiser.

But the cosy little cabin is a bit of a mess. These are a perfect storm of challenges for Bot Ross. The cabins are much less common in the training data than mountains or pine trees. Even worse, the cabins look very different in each painting. Consequently, Bot Ross has struggled to learn how to paint them. He’s managed a suggestion of a snow-covered roof, but it’s mostly still just a blob.

What’s interesting is that Bot Ross has rendered this as a snowy scene. He doesn’t simply average the colours of the training paintings, he decides, based on the composition — maybe the presence of bare trees and so on — whether this should be a snowy or lush scene, and colours the painting appropriately. Looking at other examples shows some of the interesting choices that Bot Ross makes.

You can see he’s sometimes very good, and sometimes quite bad. More conventionalised elements are rendered reasonably well, but anything out of the ordinary becomes a confusing mess of indistinct shapes.

The real challenge for Bot Ross though, is to see how he fairs with one of my original artworks. Here’s my painting from earlier, “Toothpaste Mountain”, alongside its “Bot Ross” interpretation.

Not bad! Not exactly up to the standards of the original Bob Ross, but certainly better than I could do given only a few hours of practice. Now there’s no need for anyone to waste countless hours slaving away at an easel, beating the devil out of their brushes, or hunting down some more Phthalo Green. With this Bot Ross model, anyone can paint like Bob Ross instantly!

The thing that’s really interesting to me about Bot Ross, and GANs in general, is the idea that “faking” doing something is the same as actually doing that thing. Our model learns to paint by learning to pretend to paint, by learning to make its own output indistinguishable from the real thing. There are obvious parallels here to Turing’s proposed “Imitation Game” in his 1950 paper “Computing Machinery and Intelligence”, which is the origin of the concept of the “Turing Test”. Briefly put, this is the idea that rather than trying to answer the question “Can machines think?”, which leads to pointless philosophical musing, we should address a more practical question: “Can a machine pretend to think in a manner sufficiently competent to convince a human observer?” In other words, don’t ask “Can machines think?”, ask “If I can’t tell the difference, what does it matter?”

Turing writes in more detail than I have room for here as to why this is a fruitful line of inquiry. But I like to think that Bot Ross provides an intriguing support for that argument. Over half a century after Turing published his paper on the subject, we’re continuing to learn the complex and close relationship between learning and imitation, and the small, possibly nonexistent difference between pretending to do something, and actually being able to do it. Bot Ross has never experienced the joy of painting, as Bob Ross describes it. It has never imagined itself walking in the tiny landscapes it creates. But looking at its paintings, can we tell the difference? Does it matter?

Thanks for reading! The previous essay in this series, about text generation, is available here. The conclusion to this series will be published next month.

--

--