The Brains in Games: Video Game AI

How Computers Think: Part Five

Published in

Towards Data Science

22 min readMay 10, 2019

Thus far, the kinds of artificial intelligence we’ve explored have mostly been confined to the realm of transforming and manipulating data — making predictions, calculating similarity, and deriving meaning. The low cost of computing power, and the ability to collect data at massive scales through the internet has created new opportunities to deploy these algorithms in hitherto impossible ways, and it is these kinds of algorithm that online platforms use to dredge users’ data for scraps of value, like digital shale oil.

But there’s another type of artificial intelligence, almost wholly unrelated to what we’ve so far explored. This class of intelligence is perhaps even more ubiquitous, and certainly more visible, though it is constrained for the most part to a small part of most people’s lives. This is the artificial intelligence of video games. Video games employ algorithms to respond intelligently to players’ inputs. From the predictable patterns of the alien ships in “Space Invaders”, to the much more responsive ghosts in “Pac Man”, the earliest video games used mathematical processes to mimic the behaviour of thinking beings. Over time, these technologies have evolved to become extremely sophisticated and produce incredibly complex behaviours. Just like the data manipulation techniques we’ve seen previously, at their core are some surprisingly simple tricks of geometry.

In this essay we will get to know three classes of algorithms that make up the functionality of video games — these are “steering”, which governs simple movement patterns, “pathfinding” — navigating a path through a complex environment, and “goal oriented action planning”, which is one way of formulating more complex behaviours.

Steering

“Steering” is the name for the class of algorithms that control movement of entities in video games. Every time a computer-controlled character moves, it does so in response to the instructions of a steering algorithm, a series of mathematical steps which reads the state of the game, calculates the position of each element, and translates that into an instruction to move in a particular direction.

To help us explore these algorithms, I’d like to introduce you to a wee fellow I have created for just this purpose. He’s not much to look at, just a red dot on a black background, and right now he doesn’t know how to do anything at all. But we can teach him, and hopefully learn something in the process.

A close-up of our boy. He doesn’t do much right now.

The first, and most elementary of the steering behaviours is called “seek”. With no goal, our boy just sits in one spot. To get him moving we need to give him something to aspire to. To start with, we’ll instil in him an urgent desire to be close to the mouse pointer. This will let us steer him about and see how he changes direction. The “seek” algorithm is incredibly simple. The boy calculates the direction to his target, and accelerates as fast as he can in that direction.

Despite being little more than a few lines of code and a red dot on a screen, once he came to life and began hurtling about the screen, it was impossible for me not to imagine in him some scant scraps of a personality. My first iteration had the boy moving too swiftly, and his position always matched the mouse pointer exactly. This version of the boy had no personality at all — he was simply an extension of the mouse pointer, an object. But as soon as I slowed him down, made him struggle to catch up to the pointer, he became something separate, a person in his own right, something I could empathise with.

The boy zooms pleasingly about the screen, eagerly chasing after the mouse pointer and swerving around as his target moves. But when he reaches his goal, something odd happens. He runs right past the mouse pointer, and then, like an excited puppy, dances back and forth over his goal. Our “seek” algorithm has a problem. Because the boy always accelerates as fast as he can towards his goal, when he reaches it, he is destined to overshoot. Like a reverse Tantalus, he can never reach his goal, but only exceed it. To fix this, we must give him one more behaviour: “arrive”.

“Arrive” is almost as simple as “seek”. The boy calculates the distance to his target, and if the distance is very short, he reduces his acceleration. By the time he is standing directly on top of the target, his acceleration is reduced to zero. This lets our boy stop right on top of his goal. Armed with these two behaviours, our wee boy happily zooms about the screen after the mouse pointer, and stops smartly once he reaches it.

I spent an unreasonable amount of time doing this

The opposite of “seek” is “flee”, and it is in many ways even more simple. As before, the boy calculates the direction to his target, but instead of accelerating towards it, he accelerates away. Using this behaviour, our happy little boy becomes a coward, running away from the mouse pointer and hiding in a corner. He only leaves this hidey-hole when the mouse pointer moves around him, and chases him in the other direction.

If I was pleased by the boy when he chased the mouse pointer, I was even more charmed by this cowardly version. His abject fear, and the way he would quiver if the mouse pointer approached him while he was trapped in a corner, conjured in me an immense sympathy. Intellectually, I know that the boy is just some lines of code, a geometric calculation that determines an angle in response to some input parameters, and moves a dot on my screen accordingly. But emotionally, it is impossible not to read the entirely deterministic output of the “flee” algorithm as the expression of an emotion.

The boy’s tendency to run directly into obstacles, and to stick there until his goal has moved, is a glaring deficiency in our boy’s behaviour. In open ground he seems very clever, taking a direct route towards his target. But put a wall in his path, and the simplicity of his seek and flee algorithms becomes apparent. He runs straight into the wall, and continues to push against it as hard as he can, oblivious to his lack of progress towards the target. This will not do. To help him out, we need to give him one more behaviour, somewhat more sophisticated than those we’ve seen until now. This is called “avoid”.

“Avoid” lets our boy swerve to avoid obstacles in his path, but to achieve this we need to expand our boy’s knowledge about the world round him. Until now, he has known only two facts about the world — his own position, and the location of his target. Now, we need him to look ahead of himself, and identify anything that blocks his path. This is done with simple geometry. We draw a line in his direction of travel, out to a distance determined by how fast the boy is moving — the faster he goes, the earlier he must start swerving to avoid a target. If the line we draw intersects with an obstacle, such as a wall, then the boy knows that he must change course or else collide. The direction in which he must change course is determined by the projected intersection point with the obstacle. We draw a line between the centre of the obstacle, and the point at which the boy has projected he will collide with it. This line, or “vector”, is pointing in the exact direction the boy must swerve to avoid the obstacle. Employing the mathematics of vectors, we can add these two lines together, and the resulting vector represents our boy’s new, amended path, steering him safely out of harm’s way.

Armed with this new knowledge, our boy can hurtle about the screen, and whenever an obstacle comes between him and his goal, he neatly swerves around it.

There’s one final behaviour in the set of “steering” algorithms, and it deals with the case where there is no goal at all. Left to his own devices, with no goal to flee or seek, our boy sits still. But what if we want him to have his own, self-motivated movement? When he has no other goal, we want the boy to toddle about on his own, heading in no particular direction but instead exploring the world more-or-less randomly. This behaviour is called “wander” and it is the most complicated of the algorithms we’ve seen so far.

The most simple implementation of “wander” has our boy choose the direction of each step he takes completely at random, with no reference to where he is or where he has been. But this produces an unsatisfying result. The boy jitters in place, making little headway in any direction. The boy is no longer an eager young chap dashing about the screen, but an anxious wreck, shivering on the spot.

Another approach is to have the boy choose a goal at random, head towards that, and only choose a new goal once he has reached that point. This gives the boy more appearance of purpose, but it’s still unsatisfying. The boy will run halfway across the screen, only to reverse course and head back in the other direction, as if he had just discovered he’d forgotten something. He runs at walls, and although the “avoid” behaviour stops him from colliding with them, it makes his behaviour look odd and robotic.

Creating a convincing “wander” behaviour turns out to be surprisingly difficult, and the choices are much more guided by aesthetics than by mathematics. There’s no one correct choice, but rather we have to be guided by our taste — what looks natural to us? The algorithm I chose has the boy travel in a straight line, but adjust his course by a small margin every so often. The result is that the boy noodles about pleasingly, turning about every now and then, busily exploring the world a little like an ant searching for food.

Each of the “seek”, “flee” and “wander” behaviours give the boy a distinct personality. When he seeks, he is eager and attentive. When he flees he is timid and cowardly. When he wanders, he is curious and busy. We can see these personalities even more clearly if we give our boy some friends. Our original boy will remain chasing the mouse pointer, using the “seek” behaviour. His first friend we’ll call “cowardly boy”. Using the “flee” algorithm, he runs from anyone who approaches too close to him — perhaps not an ideal friend. We’ll remedy that with the next friend, “friendly boy”, who uses “seek” to follow anyone who approaches him. Finally, we’ll add a “busy boy”, who ignores everyone and simply wanders around the screen.

These four lads all interact with each other in characteristic ways. Our original boy in red zooms about after the mouse pointer, pulling the friendly yellow boy in his wake. If this enthusiastic duo approaches the cowardly blue boy, he dashes into a corner to hide. Meanwhile, the busy boy goes about his business oblivious to it all, unless he happens to wander close to one of the others, in which case his “avoid” algorithm lets him swerve around them. Watching them chase each other about, their personalities are even more apparent, and it’s even harder not to imagine their movements as expressions of their feelings, rather than simply a geometric operation. When the friendly boy latches onto the cowardly boy and chases him into a corner, oblivious to his obvious distaste for company, it’s hard not to sympathise with the cowardly boy, shuddering in the corner while the friendly boy bounces around him.

Poor yellow boy. Nobody wants to be his friend.

These steering behaviours are the building blocks of more complex artificial intelligence systems in games, but they are hardly intelligent at all. They are far less sophisticated than even the most simple of the algorithms we saw in the previous chapters. But what is fascinating about them is how, while they perhaps lack intelligence, they have another even more intangible quality: personality. Steering algorithms are little more than geometry, but when I see the little boy I created zooming about, I can’t help but ascribe to him motivations and feelings. Despite knowing that his behaviour is nothing more than the output of some very rudimentary lines of code, I feel like there must be something more. His personality is not a quality of the boy, it’s something that exists entirely in my own mind. He becomes more than the sum of his parts, he becomes a person. But that person lives only in my imagination.

Why is it so easy for me to see the boy as having some small scrap of personality, while the same is not true for the vastly more sophisticated algorithms we’ve seen so far? What are the qualities that encourage me to empathise with the boy? Is this “intelligence”, or is it something else? As I look at more sophisticated video game intelligence systems, I’ll return to this question.

Pathfinding

Our boy is pretty clever! He can zoom around the screen, chasing the mouse pointer, and even swerve to avoid obstacles that get in his way. The simple steering algorithms that guide his movement produce a complex pattern of behaviours that give the illusion of “smart” behaviour. But there’s one area where our boy’s simple brain lets him down. A straight wall or another boy in his path doesn’t phase him — he simply veers away until he has cleared the obstacle. But more complex shapes — a dead-end corridor or even a deep corner — present an insurmountable barrier. Any time he has to briefly move away from his target in order to clear an obstacle, the boy is completely stumped. He stays trapped in the corner, sadly bumping his head against the wall, with no idea how to proceed.

To help him, we need to give him some way of thinking ahead — of understanding that in order to move forward, he must first head backwards. The way we’re going to achieve this is called “pathfinding”. We’ll give the boy an imaginary map which lets him plot a course to his goal, avoiding intervening obstacles, and reversing direction when he needs to escape from dead ends.

Imagine that there’s an invisible grid overlaying the world our boy is exploring. The grid is comprised of dozens of nodes, connected to their neighbours by a hair-thin line. Where that line crosses only empty space, we call this an “edge” in our network. But when the line is impeded by an obstacle — a wall or one of the boy’s friends — then we say that the nodes are “unconnected”. The result is a kind of negative-space version of the world. Where there is open space, the grid is a regular mesh of nodes and edges connecting them. But walls or other obstacles puncture the mesh, creating gaps in the network.

This grid is the map our boy uses to plot a course. Instead of accelerating directly towards his goal, the boy is more considered. He finds the node in the grid which is closest to himself, and the node which is closest to his goal. Then he calculates a path along the grid’s edges, through all of the intervening nodes. Where there are no walls, the path is a straight line through those nodes which are directly between him and his goal. But where there are obstacles in his path, the grid’s edges do not connect across these. The boy’s path must route around this gap in the grid. With this path in mind, the boy can confidently find his goal. Instead of shooting off directly towards his target, he heads towards each successive node in his path, assured that, since an edge connects the nodes, there are no walls in his way.

How the boy constructs the path through the nodes and edges is one of those pure-maths problems that is intensely interesting to a very specific set of people, and incredibly dull to everyone else, so I will skip that here. Suffice it to say that it is a well-explored area of mathematics that can reliably find more-or-less optimal paths through an arbitrary grid at extremely impressive speeds. This impressive speed means that our boy can recalculate his path as his goal moves, seamlessly reorienting to duck around walls or change direction completely to find his way unerringly to his target, wherever it hides. Confronted with even the most fiendish of mazes, the boy sets off immediately, never taking a wrong turn, and zooming through the twists and turns without hesitation.

In fact, the boy’s unerring skill at solving mazes has an interesting, counterintuitive effect. Previously, the boy’s eager seeking of his goal, and his thoughtless blundering into corners and dead-ends had a kind of naive charm to it. He was relatable in his overeagerness. Now, his implacable tracking towards his target and unhesitating avoidance of anything in his path eliminates that charming fallibility. The illusion of the boy having thoughts and desires is shattered. Instead of an eager boy, we have an unthinking automaton. By making the boy smarter, we have, ironically, made him seem entirely mindless.

The problem is that the boy has been given perfect knowledge of the maze, and can recall that information instantly and without error. We can no longer so easily identify with his situation. To bring his behaviour back into the familiar realm of the human, we have to reintroduce elements of discovery and error. This also requires us to add a great deal of complexity.

Previously the boy was given a map of the maze — a network of nodes and edges showing where walls blocked his path. Now, we give him a blank slate. His network of nodes and edges starts off fully connected, with each node having an edge connecting to its neighbours. The boy believes that all areas can be reached from anywhere, until he learns otherwise. This learning process is governed by a list of known walls he keeps in his memory. Every wall he knows about is stored there. When the boy sees a new wall, he pops its dimensions into his memory-bank, and he updates his understanding of the world. He finds all the edges in his map of the world which intersect with this new wall, and prunes them from the grid. Then he recalculates his path to his goal. Where before a path existed, now the edges between the nodes have been removed, and the path is gone. The boy must find a new route. As the boy moves around the maze, he encounters new walls and updates his map accordingly. When he discovers a dead end at the end of his route, he turns around and tries another way.

When we watch the boy traverse the maze with this new algorithm, he is relatable again. He stops, reconsiders, turns around and tries new approaches. We watch him take a wrong turn, discover his error, and then return to try another way. He takes far longer to complete the maze than previously, due to his many missteps. But, oddly, he seems more intelligent, not less. He’s slower at completing the maze, but now he’s genuinely solving a puzzle, and watching him, we can empathise with him. He learns from his mistakes, and slowly becomes an expert at traversing the maze he’s learnt.

Our boy is now quite sophisticated. He has a memory, he can see things in the world around him, and he can make plans and update them as his understanding of the world changes. But we’ve also found something interesting: What makes the boy seem intelligent isn’t how sophisticated his processes are, or how good he is at completing tasks. “Intelligence” is something less tangible. It’s about how much the boy’s behaviour resembles human behaviour. It’s about whether, looking at him, we can see ourselves.

Goal-Oriented Action Planning

We’re nearly done with our boy. We’ve taught him steering, so he can seek out or flee a goal, and avoid obstacles along the way. We’ve taught him pathfinding, so that he can find his way to a goal even if that requires him to navigate a complex path. We’ve even given him a rudimentary memory — he can remember the walls he’s encountered, and use that memory to update his plans for getting to his goal. But all this goal-seeking feels a little short sighted. Our boy is a world-class talent at seeking a spot on a screen, but he has little idea what he’s going to do once he gets there. He’s got short-term goals, but no longer-term ambitions.

Let’s fix that. Let’s give our boy some facility for making plans: “first I’ll do this, then I’ll do that”. We’ll give him some proficiency not just at goal-seeking, but actual problem solving. This is what allows, for example, the guards in an action video game to patrol an area, and attack any intruders they discover. It’s the difference between the ghosts in Pac-Man, which each follow their own simple routine to chase the player around the maze, and the characters in The Sims, who lead complex lives and follow sophisticated routines, only needing the barest guidance from a human player.

There are lots of ways to do this, but the way we’re going to use is one that’s been used in some form in many popular video games. It’s called “goal-oriented action planning”, and it draws on some of the same pathfinding algorithms we’ve already learned about.

Imagine you’re in a locked room, and you want to get out. In the room, you can see a key. In your mind, you form a plan: go to the key, pick up the key, go to the door, unlock the door, open the door, leave the room. It’s not a complex plan, but it’s quite important. You can’t open the door until you’ve unlocked it. There’s no point in going to the door until you’ve got the key. The sequence of actions needs to be performed in a particular order to succeed. Imagine a more complex scenario: getting ready to go to work in the morning. There’s a whole host of actions you need to take in order to get to work, dressed, clean, and on time. Some of them can happen in any order, some of them have complex requirements. To eat your breakfast, you need to go to the fridge, take out the milk, go to the counter, take out a bowl, and so on. To take a shower, there’s a whole other sequence of actions you must take. But whether you take a shower first, or eat your breakfast first has no bearing on how successfully you eventually get to work. The key insight of goal-oriented action planning is that these sequences of actions can be expressed using the same “nodes and edges” concepts that we’ve used for pathfinding. The simple locked room scenario is a straight path. To get to “pick up key”, you must pass through “go to key”. To get to “unlock door”, you must have passed through both “pick up key” and “go to door”. There’s only one way of ordering the sequence to achieve the final goal. The “getting ready for work” graph is much more complex. There are straight paths within — the “fridge — counter — bowl — breakfast” path, for example. But linking these is a complex web of connections — things that can happen in any order — you can brush your teeth in the shower, if you choose (though it feels wrong to me, somehow). You can put on both socks, then both shoes, or complete one foot’s dressing completely before moving on to the next (which also feels wrong to me — like one foot is briefly overdressed).

That over-long digression into my opinions on morning routines is a complex way of describing something that, in the abstract language of computers, can be represented quite simply, by two linked concepts: “States” are things which are true about the world. Have you showered? Have you got your toothbrush? “Actions” are things you can do. “Get in the shower”, “Pick up your toothbrush”. Actions have consequences — they change the state of the world — and they also have requirements — states that must be true for the action to be taken. In the language of algorithms, states are nodes, and actions are edges — they form a “graph”, exactly like the grid that helps our boy navigate the maze, but this time it’s a map of abstract concepts, not physical space. To make a plan, the boy calculates a path through this network of states and actions, until he reaches our “goal” state.

This is something our boy already knows how to do! We’ve already learned about pathfinding, and about building a grid of nodes and edges representing physical paths. The only change is to build this grid in the realm of more abstract states. To help us do that, we’re going to give our boy a job. And like many young people getting their start in the world, his first job is going to be in retail.

First we’ll set up his shop. We’ll add some walls representing aisles, a little room in one corner representing the storeroom, and a small box representing the counter. All of this is created using the same concepts as we’ve used previously to create a maze which the boy can navigate. The boy can explore his shop in exactly the same way. The boy has one job, for now, cleaning up after customers. Like many retail employees, the boy doesn’t have a lot of options about how to do his job. His understanding of the world can be represented by a very simple network:

Each circle is a “state”, and the lines between them are “actions”. To achieve his goal — the “Mess cleaned” state, he has to take the “clean mess” action. But he can only take that action when he is in the “close to mess” state. To get there, he has to take the “go to mess” action, which is only possible when he’s in the “can see mess” state. What “goal oriented action planning” achieves is allowing the boy to navigate a path through these states and actions. He understands the order in which he must take the appropriate actions. What’s more, if he’s interrupted in these actions, he can pick up where he left off.

This is what this looks like in practice: He wanders around aimlessly until he spots a mess, and then he kicks off a simple routine: Go to the mess, clean it up. Going to the mess, the boy must engage his physical pathfinding and steering skills to navigate to the correct location. To clean it up, the boy might pause for a set period. In a more sophisticated simulation, we might play an animation of the boy furiously mopping.

What drudgery! At this stage, the boy’s behaviour is hardly more sophisticated than it has been before. Our boy is wasting his potential! It’s time for him to take on greater responsibilities. Having carried out his mess cleaning duties faithfully for a time, let’s give him an additional task. No shop functions without customers, so let’s introduce some. The customers, just like the boy, have a series of states and actions they can take — looking for an item in the shop, going to it, picking it up, taking it to the counter, waiting for assistance, paying for the item, and leaving the shop. The boy gets a complementary set of states and actions: If a customer is at the counter, go to the counter, and sell them the item.

Now he has two priorities. If there is a customer waiting at the counter, he heads over there to make the sale. If there’s a mess on the shop floor, he runs to clean it up. In between, he hovers anxiously, looking for an opportunity to help out. We’re able to add new responsibilities like this very easily, because of the flexibility of the underlying decision-making architecture. Each state and action is simply a node and edge in a grid. Giving the boy new goals is as simple as identifying which states are required to fulfil a goal. Adding new actions is as easy as describing their requirements and consequences. As new goals, actions, and states are added to the world, the boy can seamlessly plot new paths through this grid of actions, figuring out how to achieve his new goal by assembling a series of actions which will result in the desired states being achieved.

Our boy (the faithful red dot) zooms about the shop, hunting for messes (green dots), which he cleans by running back and forth over them. Customers (orange dots) come into the shop from the top left, and seek out items (grey dots). When they find them, they pick them up and take them to the counter (bottom right), where they wait for our boy. When the boy sees them, he rushes over to sell them the item, at which point they head back to the top left of the screen to leave the shop.

The outcome of all of this work is a complex dance of interactions on our screen, in the boy’s shop. Customers come and go, busily zipping around hunting down the items they need. The boy eagerly chases after messes on the shop floor, and when he notices customers waiting, he dashes over to take their payments. Despite the rudimentary graphics and the slightly clunky movements, the whole thing takes on a convincing semblance of life. Each customer is acting on its own simple impulses, and the boy is scarcely more complex, but together, interacting, we observe a fascinating phenomenon: emergence. “Emergence” is what happens when simple systems, interacting in high volumes, create complex behaviours. Two customers race each other for the same item, caused by their goal-setting behaviour. The “avoidance” steering that the boy and the customers share cause them to dance awkwardly around each other when their paths meet. Most interestingly, we find that messes accumulate in the far corners of the shop, away from the counter — the boy is drawn back to the counter to sell things to the customers before he can wander into the far corners to clean. The boy’s behaviour becomes hard to predict. His internal state is governed by the positions of several ever-changing elements, and it becomes hard to understand what is governing his decisions. Our simulation, the shop, is simplistic enough that given time, we could break down each action and reaction in terms of the simple algorithms that govern the boy’s behaviour, but as the complexity increases, it becomes increasingly less useful to do so. Almost inevitably, it begins to feel natural to explain the boy’s behaviour in terms of his desires and preferences, in the language we use to describe real-life, human interactions. “He pushed that customer out of the way so he could get to the mess”, “He gets confused when there’s a mess behind a wall”. “He didn’t see those customers until they’d been waiting a while”. “He’s lazy about cleaning the whole shop”.

The last few essays have focused on a branch of artificial intelligence that’s only tangentially related to the world of data science and machine learning, where the most advanced research in this field takes place. In many ways, the algorithms which drive video game “bots” are trivial, and not hugely relevant to the emerging field of computer intelligence. But in another sense, these algorithms are some of the most relevant to humans’ experiences of artificial intelligence. More than almost any other kind of artificial intelligence, video game bots are on the front lines of human/computer relations — they’re the most “human-like” of their kind, despite often being substantially less mathematically sophisticated. That is the crucial insight from these algorithms: that it is relatability, not complexity, that it is empathy, not processing power, which defines what we consider to be intelligent, what we consider to be “like us”.

The previous article in this series “Movie Maths: How Computers Understand Text”, is available here. The code for this essay is available from my github, here.

The next essay in this series will be published in June.

The Brains in Games: Video Game AI

How Computers Think: Part Five

Steering

Pathfinding

Goal-Oriented Action Planning

Written by Simon Carryer