I fed a bot eight hundred burgers, and this is what it threw up

Published in

Towards Data Science

13 min readAug 17, 2019

It seems there’s nothing a human can do that some clever jerk with a computer won’t come along and do better. First it was simple calculations, then it was factories. The computers beat us at Chess in ’96, they mastered Go in ’15, and now they’re coming for our video games too. It’s starting to get a little bit galling. But there’s one area where we’ll always have the edge over our digital rivals: Creativity.

Right? A computer can’t produce a creative work. Can it? Can it imagine? Let’s find out. And what better test than that pinnacle of human ingenuity, that crowning glory of civilisation, the hamburger.

Every year in my home town there is a food festival in which restaurants from around the city compete to create the best, most innovative, most delicious burger. For the last six years I have diligently collected data about these burgers. In addition to statistics on the kinds of bun, patty, and other ingredients used in the burger, I have recorded the name and description of each burger — over 800 in total.

1815 Cafe and Bar’s “Sweet Baby Cheesus”: Crumbed mozzarella with smoked cheddar, parmesan cream, jalapeno relish on a herbed milk bun, with straw fries. Matched with Garage Project White Mischief — Salted White Peach Sour.

The burger descriptions are a testament to human ingenuity. Just about anything that can fit between a bun has made an appearance on the menu, and, for the most part, they show thoughtful consideration of flavour combinations. This is a remarkable feat of creativity — to imagine and execute a combination of ingredients that is novel and exciting, but that also tastes good.

Is this something we can teach a computer to do? We can at least try! Let’s build an algorithm that can produce a description of a never-before-seen hamburger.

We need an algorithm that can generate text. These are the same techniques that power chatbots, and predictive text features, like pre-generated email responses and automatic completion of sentences. They all have a common approach: given a word or phrase, what is the most likely word to follow? A sentence that starts “Yesterday I went…” might end with “…to the library”, but equally it could conclude with “…completely mad.” These algorithms attempt to calculate, given the context, which is the better choice.

Let’s explore some of these algorithms, see how they work, and hopefully produce some tasty burgers along the way.

The simplest way to achieve this is called the “Markov chain”. This works like the game “Dominoes” in which tiles are laid down connected by matching numbers. Instead of numbers, however, our tiles hold words.

Does anyone actually use Dominos for this?

Imagine we take every burger description in our set, and we break it up into pairs of words. The Beef Wellington Burger’s “Char-grilled beef patty with field mushrooms” becomes “Char-grilled beef”, “beef patty”, “patty with”, “with field”, and so on. We break up every burger description this same way, and like dominoes, we jumble them together in a bag.

Now, we take these tiles out of the bag one at a time. Just like dominoes we can lay them out, connecting matching words together. A tile that has the words “patty with” might connect to the tile that says “with field” — like the Beef Wellington Burger, but it could also connect to one that says “with fries”. In this way we put together burger descriptions that sometimes almost make sense. Here are a few burgers built using this method:

A pie-burger of cheese braised beef patty with pastrami Swiss cheese curds matchstick fries and gorgonzola patty vegan bun with woodfired beetroot ketchup.
Handmade all smoked beetroot bun with Indian spiced Ōtaki guacamole hard goats cheese sauce in Arobake paprika patty stuffed with handcut fries Matched w Garage Project beer battered onion in an Astoria bun with cabbage slaw in salsa rossa…
Spiced wagyu beef dripping fries (V)

These burgers have some problems! The first burger is almost credible, leaving aside the questions about how you “cheese braise” beef, or how a “gorgonzola patty” might taste.

The second burger has bigger problems — it has two buns, and far too many ingredients. The description barely makes sense.

Finally, the third burger isn’t a burger at all! It’s just some fries with beef dripping, somehow made vegetarian.

These burger descriptions expose some of the weaknesses of the Markov chain method. For such a simple algorithm, it produces surprisingly coherent results in this instance, but it’s ignorant of several important rules of burger construction: A burger should have just one bun, one patty (typically), and at most two or three other ingredients. If it comes with fries or other accompaniment, these should be introduced at the end of the description. The Markov Chain approach can’t account for any of these rules.

We need something far more sophisticated; something that understands something about the meaning of the words it’s using, and crucially, the meaning of the order of those words.

The problem we’re trying to solve is, given one or more previous words, how to predict the next word in a sentence. If a burger description starts “Pulled pork with…” is the next word more likely to be something like “kimchi”, “marshmallow”, or “rat-poison”. To generate our burger descriptions, we start with one word, chosen at random from the real burger descriptions we’ve collected. Then we try to predict, given that initial word, what the most-likely second word will be.

If our first word is “smoked”, the second word is likely to be something like “venison” or “bacon”, and probably not “lettuce”. Given the second word, we feed the pair of words back into the model. Given those two words, what is the most-likely third word? “Smoked venison patty” is more likely than “smoked venison bun”, for example.

The model we’ll use is called “LSTM” which stands for “long short-term memory”. Like the algorithm we used in a previous essay to classify images, this model is a “neural network”. This employs two or more “layers” of algorithms to first distill the raw words of the burger descriptions into meaningful data, and then makes a prediction from this distilled “embedding”.

The LTSM employs two very important innovations. Firstly, it is a “recurrent” neural network. That means that it is aware of not just the meaning of the words in the descriptions, but also of the meaning present in the order of those words. The second innovation is that this model can account for varying importance of words in a sentence — the model recognises that some details of the burger — like the bun and the patty — should show up only once. The model can choose to “hold on” to words that occurred early in the burger description, and use those important words to inform its choices of words later in the text.

Like other neural network models, the LTSM is trained iteratively. It starts out making very naive choices, and with each iteration (or “epoch”) of training, it becomes more sophisticated. Here’s the output of our model after just one epoch:

beef and and and and and and and and and and and and and and and and and and and and and and and and and and and and and and

It’s not an inspiring start. From the “seed” word we provided the model, “beef”, the model has chosen what it believes is the most-likely next word: “and”. Not unreasonable — “and” is the most common word in the descriptions overall. But the model makes the same guess again and again, never updating its prediction as the sentence grows longer. Let’s see how it does after another round of training:

beef patty with with and beef and and beef and beef and and and beef and and beef patty with and and and and beef in with a and and

Still not great, though it’s already made substantial gains on the previous attempt. It’s learnt that “patty” usually follows “beef”, and that “with” is another common word. Let’s keep training.

beef patty with bacon and smoked and smoked and beef patty with house-made and cheese and aioli in a pandoro bun with handcut fries

Now we’re getting somewhere! The model has learnt a few important things: It’s adding more ingredients to the burger, and it’s learnt to put those ingredients in a credible order: patty, other ingredients, bun, and finally fries. There’s still too many “ands” and “withs”, but we’re getting somewhere. Let’s go again:

beef patty with smoked bacon and smoked cheese in a pandoro bun with hand-cut fries

This is just about perfect! Four epochs of training (about forty minutes on my laptop) have given us very strong results. This is a pretty terrible burger, but at least it makes sense.

From here, improvements to the model come much more slowly. It takes ten more epochs (and several hours) to reliably achieve a really credible burger:

beef patty with smoked cheddar tomato and kāpiti blue cheese in a pandoro milk bun with fries

The model has pretty much nailed the format of a burger description. It’s learnt phrases like “smoked cheddar”. It’s learnt that each ingredient should only occur once, though in this case it has both cheddar and blue cheese which is an odd culinary choice. Aside from missing commas and capital letters (which were removed from the training data), this burger would not raise any eyebrows on a restaurant menu.

Here are a few more burgers created by supplying different “seed” words to the same model:

tapioca spiced lamb patty with smoked cheese and clam seasoning and pickles in a pandoro bun with fries
aged beef patty with smoked bacon and bacon jam tomato relish and rocket in a zaidas bakery milk bun with fries
pork patty with smoked cheddar and smoked cheddar slaw in a pandoro milk bun with fries
fried chicken with kimchi and red cabbage slaw in a pandoro milk bun with fries
cookie-crumbed village meats pork patty with smoked cheddar trade kitchen kingsmeade castlepoint feta and kāpiti bay butchery bacon in a clareville bakery bun with hand-cut fries

Pretty tasty! I don’t know about that cookie-crumbed pork or the tapioca-spiced lamb, but for the most part these look like pretty good burgers. The model has learnt a lot of complex rules of burger construction, which has enabled it to reliably produce a convincing facsimile of a restaurant’s burger description.

But the above selection of burgers conceals a weakness of our model. Looking at a wider selection, the burgers it produces are almost all extremely similar. The model has patterns it repeats over and over: “smoked bacon and bacon jam tomato relish and rocket” shows up frequently, and every burger comes with fries, occasionally hand-cut. Furthermore, the range of burgers generated is pretty narrow. Most are beef burgers. Very few stray from traditional burger norms. Our model is a very unimaginative chef.

Here’s what has happened. The model learns to predict each word in a sentence and it is penalised if it guesses incorrectly. This helps the model learn the structure and patterns of burger descriptions, but when it is given a relatively small dataset, it encourages the model to make very conservative choices. It learns to make the most-likely choice at each juncture. It is not trying to be creative, it is trying to be correct.

This is a fundamental constraint of all artificial intelligence algorithms. At their core, they operate on the principle of minimising their errors. Our burger generator has simply learned the principles that define the average burger, and is trying to make the least surprising choices it can. It will never include any ingredients that it hasn’t already seen. It will never come up with new cooking techniques. It might be more adept than the Markov chain algorithm at mimicking the pattern of a burger description, but it is essentially still just rearranging pieces of the descriptions it was trained with. It is a bit of a disappointment. Can we do better?

Turns out yes! While we can’t get around the core constraint — that these algorithms in their hearts desire to minimise surprise, not maximise delight — what we can do is dramatically expand the scope of the information the generator has to draw on, and also increase the complexity of its calculations. We need to delve into hard, cutting edge research — the kind of stuff that needs huge supercomputers and teams of people with degrees in unpronounceable fields. Stuff that’s way beyond my understanding. To access this technology, we’re going to use a far older and more straightforward technique. We’re going to copy someone else’s work.

In April 2019 the Open AI Project released their “GPT2” model. This is a “general language model”. It is to our burger generator as a nuclear power plant is to a camp stove. Our burger generator has about two hundred thousand parameters — individual features it can learn about the data. GPT2 has over a billion. While our burger generator was trained on eight hundred burger descriptions — around 25,000 words or 140 kilobytes — GPT2 was trained on 8 million web pages — 40 gigabytes of data. You can read more about this model at the Open AI website, but you’ll probably enjoy more experimenting with its responses in real time.

This hugely powerful model operates on similar basic principles to our previous burger generator, just scaled up massively. Trained on a huge volume of text, it doesn’t just know how to make burgers, it knows how to generate any kind of text. Given a seed word, “Beef”, it will confidently rattle off a string of sentences on that theme:

Beef stew at a roadside restaurant in Chiang Mai can be a source of instant nostalgia. That’s because the town’s long-term favourite dish, beef stew, came from the village, and it’s been around there for over 1,000 years

This is intriguing, but it’s not making burgers. To achieve that, I retrained the model on the same dataset of burgers as previously, although as an extra challenge, I also included the burger names. The model retains its complexity and depth of knowledge of the text it’s seen, but it learns to focus that training on creating burgers. At first, its efforts are pretty bizarre:

…end of a large family, which in all probability, is a good idea. This is what the “craziest” people in the world say about the most recent study that is, as it were, still relevant; and, even more importantly…

But very quickly it learns to favour burger-like words:

…was a very good vegan sweet and spicy fried chicken with sesame tomato ketchup mustard pickle fries with mint and prawn-braised poblano patty [The Baked Potato]
Favourite Baked Potato, with a bacon…

Interestingly, as it learns, it retains some quirks of the data it’s been trained on. Here it gives a photographer credit for a nonexistent illustration.

Mushrooms, with a hint of housemade red cheese patty, lettuce, red onions and caper. Packed in a ziploc bag. (Picture: Cribra Zaki)

And here it seems quite excited about the burger it’s discovered:

Oh boy, a fried chicken bak was a B-movie, with fries and crispy onion bak.

Before long however, the generator is producing a wide range of novel, sometimes unappetising, but definitely original burgers:

Midtown Hangover: Beef patty with fried onions, crumbed mozzarella, avocado, red cabbage miso, lettuce, lettuce pickles and Zany Zeus hollandaise in Kairanae honey roasted bread.
Sir Patrick Stewart: Beef patty hails from Dorset with Kingsmeade-spiced fennel, pecorino, rojo cheese, kiwifruit slaw, herb masala pepper mayo and johnbé dip in a beetroot bun, with baby corn popcorn-salt tahini coleslaw.
I Havoc ’Em All: Beef with rib eye meat, potato and cucumber chutney, smoked pretzel, cinnamon potato, beetroot chutney and mango chutney in a rye bun, with red kiwifruit nuggets and confit duck seed slaw.
Lady Chook Chook: Crispy fried chickpea with shoestring cheese, garlic butter, BBQ pork sauce and star ancho chutney in an Arobake milk bun, with poached piccalilli and crispy poached egg.
Fat Chick Burger: Fried sweet, spicy chicken thigh with chunky julienned pico and chipotle slaw in a soft shell that was cut with a slaw glaze.
Jammy Cow Meat Burger: Beef patty, hay haka feta, hops butchery blue cheese, creamy gravy, crunchy greens and caperised pea fritter in a sesame soft rice bun.
The Notorious Redeemer Last Spring sliders all spring and summer: Nacho Burger; Huffman Burger; Roma beef patties with Toucan ricotta gravy; Venison slaw with crisp rosemary hash and house cured pineapple, with soft cheese sauce and hand cut apple slaw.
Dubious Pig Pig Off White House Meat with fried onions, fried beetroot slaw and a bajillion green leaves, with kūmara fries.

So has the computer triumphed once again? Can the chefs of the world hang up their toques, trusting in the algorithm to invent more creative meals than they could ever imagine? Have we taught a machine to be creative?

Not really. As we learned, our models, even the most sophisticated ones, are not attempting to be creative at all. They’re just trying to correctly guess the next word in a sentence. The extent to which they appear creative to us is a function of them failing, not succeeding. The final model has learnt very well the structure of English sentences, and even some semantics about how verbs and adjectives are applied to modify foods: “poached egg”, “crunchy greens”, “crumbed mozzarella”. But it hasn’t quite learned the rules of what constitutes a “normal” burger, and consequently it makes unusual, surprising choices. We can read these choices as “creative”, but they’re in no way the product of a deliberate effort towards a novel creation.

To make a genuinely creative algorithm, we’d need to reward “creative” choices, and penalise less-creative ones. We’d need to be able to measure creativity. This exposes the futility of our endeavour. What is creativity, anyway? What is a chef doing when they come invent a new recipe? What is an artist thinking when they put paint to canvas? To some extent they are doing the same thing as our algorithm — they are working within the constraints of their chosen medium; but they’re also pushing against those constraints. They’re trying to express something.

Does that matter? Maybe not. If someone sitting down to order a “Jammy Cow Meat Burger” or a “Midtown Hangover” can’t tell that their burger was created by a computer and not a human chef, would they know the difference? Would the Jammy Cow’s “caperised pea fritter” taste worse for not having been conceived by a human mind? Would it seem any less creative? What if “creativity” isn’t in how a chef comes up with their recipes, or in how an artist chooses their paint. What if it exists only in the mind (and stomach) of the person eating the burger? We can’t make a creative machine because we can’t measure creativity. Maybe that’s because there is no measure for it — creativity doesn’t exist in the act of creation, but in the act of perception.

These questions go beyond machine learning and burgers. But in another sense, they are its central concern: What does it mean to think like a human being? It’s food for thought.

Thanks for reading! The previous essay in this series — on image recognition — is available here. The next essay will be published next month.

I fed a bot eight hundred burgers, and this is what it threw up

Written by Simon Carryer