Understanding basics of machine learning through Super Mario

Adam Wattis
Towards Data Science
5 min readSep 13, 2017

--

Super Mario Land for gameboy

There are plenty of articles written about neural networks and machine learning out there, as the subject has gained popularity greatly the past few years. This area can seem extremely unapproachable and difficult to understand and many might believe that one must be a mathematician or statistician to be able to grasp the concepts of machine learning. However, the basic concepts of machine learning and neural networks are not necessarily as complex as one might think.

The purpose of this article is to explain the high level concept of how machine learning could work through a simple example that anybody can understand. Hopefully it will give you the interest and confidence to continue reading and learning more about the subject and break down the daunting barrier which makes it seem so unapproachable.

Reinforcement learning

A machine learning program differs from a regular program in the way that the logic is not explicitly defined by the programmer. The programmer has instead created a program that has the ability to teach itself how to successfully complete the task at hand. The example I will give will probably be considered a reinforcement learning machine learning program. This program takes in input, makes it’s own decision and produces an output, then learns from the reward yielded by the output, and repeats the entire process over and over again. It might sound abstract, but it is pretty much the same way we humans learn how to do things too. Below I’ll break it down into steps that humans will be able to relate to. And what better way is there to relate to something than Super Mario?

To begin our thought experiment you must imagine that you are totally new to computer games. You have never even heard of them, much less ever played one before. Someone then presents to you a game of Super Mario.

Input

You look at the screen and see the simple 2D landscape. This is considered your input; you begin to look at the little figure, which is Mario, and then all other objects in the landscape.

Output

You have four possible ways to interact with the game. You can go left, right, crouch and jump. These are your outputs. You make your decision of which output you should choose based on the input.

Reward

You are currently unwise to what potential rewards exists in the game but will very soon get to experience them first hand. The rewards for each output can vary in the game. If you just walk left or right, the reward is pretty low. If you walk into a coin, the reward is slightly higher. If you jump into a Mystery Box, the reward is even higher again. However, if you get hit by an enemy your reward is negative — needless to say, a negative reward is more like a punishment.

You learn how to play Super Mario

Totally novice to to how the game is played you begin with pressing the right arrow. You get rewarded by Mario moving to the right. However, as you continue pressing the right arrow, Mario eventually hits a Goomba, upon which you get rewarded by death!

Mario got rewarded by death for hitting the Goomba

No worries though, you get to start over. This time, when you register the input of the Goomba coming towards you, you attempt other outputs to achieve a different reward. After a few tries you realize that the output with the highest reward in that encounter is to jump on-top of the Goomba, or over it. You’re now starting to learn how to play Super Mario.

Mario is rewarded by staying alive for jumping over the Goomba

This is how the concept of reinforcement learning could work in real life. The machine learning program begins as a clean slate, not knowing anything of the task it is supposed to be doing. It then takes an input, which is the environment it has been placed in, and starts trying to figure out which output gives it the highest rewards. This concept is very much aligned to how humans learn to do new things through trial and error. We try things one way, then a different way, until we get the reward we want or expect.

But everybody already knows how to play Super Mario…

True, nobody really has to learn how to play Super Mario as it is a very intuitive game. Even young children are able to play the game successfully.

The reason for this is that we include our previous experiences in the game and can therefore better navigate the rules of the game to attain a higher reward much faster. In machine learning, this would be considered training.

Training

As previously stated, whenever the machine learning program begins to try to figure out how to successfully complete a task it will start with a clean slate. It is therefore important to give the program some training data so that it has some base to start off with, so that it doesn’t make trivial mistakes over and over again until it finally learns what the task is about.

Sure, given enough time, the machine learning algorithm will be able to train itself to complete the given task successfully. This could take a long time though, so it is important to feed your program with high quality training data to get more accurate results.

Conclusion

They key takeaway from this read is to understand how a machine learning algorithm is more similar to how humans learns to complete tasks than a procedural program does. In a procedural program the decision making is hard-coded into the program through conditional statements. This would be, for example, the programmer identifying threats in the Super Mario game and entering statements such as: if threat detected, do this particular action, or else continue on. The main difference between this procedural approach and a machine learning approach is that, ultimately, the computer program has autonomy to make its own decisions. This is important to understand because the machine learning program can then be used in other similar tasks without having to be reprogrammed. Our example program could very well learn how to play any other game after learning to play Super Mario and have a high success rate, and will become better and better at playing games the more games it plays. This behaviour is very similar to how a human would become better and better at playing different games, even though each game is not entirely the same. We use our previous experiences to learn quicker.

This article was only a very brief overview of a subject that goes much more in depth. If you’re interested in going deeper into the subject of AI and machine learning I would recommend this article series.

Clap and comment if you liked the article!

--

--