Getting Started

Ultimate Guide to Reinforcement Learning Part 1 — Creating a Game

In this comprehensive article series, we will build our own environment. Later, we will train a neural network using reinforced learning. Finally, we will create a video showing the AI playing the environment.

Daniel Brummerloh

Published in

Towards Data Science

7 min readNov 12, 2020

The complete code of the environment, the training and the rollout can be found on GitHub: https://github.com/danuo/rocket-meister/

What we will cover:

Part 1 — Creation of a playable environment with Pygame

Create an environment as a gym.Env subclass.
Implement the environment logic through the step() function.
Acquiring user input with Pygame to make the environment playable for humans.
Implementing a render() function with Pygame to visualize the environment state.
Implementing interactive level design with Matplotlib.

Part 2 —Training a Neural Network with Reinforced Learning

https://medium.com/@d.brummerloh/ultimate-guide-for-ai-game-creation-part-2-training-e252108dfbd1

Define suitable observations while understanding the possibilities and challenges.
Define suitable rewards.
Training neural networks with the gym environment.
Discussion of the results

This is the first part of the series. We will implement the game logic, acquire user input data for the controls and implement rendering to make it possible for humans to play the game. For this, we will be using a popular python package called Pygame.

Requirements

As the model we are going to train is relatively small, the training can be performed on a consumer level desktop CPU in a reasonable amount time (less than a day). You do not need a powerful GPU or access to a cloud computing network. The python packages used in this guide are listed below:

Python 3.8.x
ray 1.0
tensorflow 2.3.1
tensorflow-probability 0.11
gym 0.17.3
pygame 2.0.0

The Environment

In the context of reinforced learning, an environment can be seen as an interactive problem, that needs to be solved in the best way possible.

To quantify the success, a reward function is defined within the environment. The agent can see the so called observations that give information about the current state of the environment. It can then take a specific action, which will return the observation and the scalar reward of the next environment’s state. The agent’s goal is to maximize the sum of rewards achieved in a limited number of steps.

From a technical standpoint, there are many different ways build an environment. The best way though, is to adopt the structure defined in the gym package. The gym package is a collection of ready to use environments providing a de facto standard API for reinforced learning. All gym environments share the same names for functions and variables, which makes the environment and the agent easily interchangeable. To adopt the gym structure, we will make our environment a sub-class of the gym.Env class. The fundamental and mandatory elements of the class are shown below:

Most of these functions and variables will be discussed in more depth later on. Here is a small summary with the most important items lisetd first:

action (object): The action to be performed in the step() function. In a game of chess, the action would be the specific, legal move performed by a player.
observation (object): This is all the information available to the agent to choose the next action. The observation is based only on the current state of the environment.
reward (float): This is the reward gained by the last performed action or during the previous step respectively. The AI will try to maximize the total reward. The reward can also be negative.
done (boolean): If set to true, the environment reached an end point. No more actions can be performed, the environment needs to be reset.
info (dict): Allows to extract environment data for debugging purposes. The data is not visible to the agent.
env_config(dict): This optional dictionary can be used to configure the environment.
observation_space and action_space: As you might imagine, only certain actions and observations are valid in regards of a specific environment. To define a format, the observation_space and action_space variables need to be assigned to a respective gym.space class. Spaces can differ in their dimensionality and their value range. Continuous and discrete spaces are both possible. For more information on the gym spaces, have a look into the documentation and the gym GitHub.

self.observation_space = <gym.space>
self.action_space = <gym.space>

Example: Definition of action space

As seen in the video, we want to control a rocket which can be accelerated forwards/backwards (action 1) and rotated left/right (action 2). Thus, we define the actions as a linear vector of the size 2.

The values of each array cell are continuous and must be in the range of [-1,1]. The corresponding gym space is defined in the following line of code:

gym.spaces.Box(low=-1., high=1., shape=(2,), dtype=np.float32)

Pygame Implementation

Pygame is a Python library designed for the creation of simple games. Key features are 2d-rendering capabilities, user input acquisition and options for audio output. The following section will cover a very basic Pygame implementation with the bare minimum features. If you are more ambitious, you can consider implementing features such as dynamic frame rate or dynamic resolution.

Rendering

To render in Pygame, we need to create a window (also called surface) to draw the visual output on.

window = pygame.display.set_mode((window_width, window_height))

Next, we can queue draw calls for the created window. You can find an overview of the available draw calls in the Pygame documentary. We will implement a couple of exemplary draw calls in a new function that we add to our CustomEnv class. The function is called render() and looks as follows:

After the draw calls are made, the window needs to be updated and actually rendered with the pygame.display.update() command.

Basic Render Loop

Now it’s time to throw it all together by creating a render loop routine that can keep our environment running. We initialize Pygame with pygame.init() Then we create a clock object, that can maintain a static frame rate in combination with tick(fps). We create a window of the size 1000*500 pixels for the visual output. Then we start a while loop that will perform step() and render() once before one frame is generated with update(). Obviously, this render loop only makes sense, if the render() actually reflects the changes induced by step().

User Input

Pygame offers two ways to acquire user input data from the keyboard:

The first one is called pygame.event.get() and will generate an event whenever a key state changes from unpressed to pressed or vise versa. Other things, such as closing the Pygame window, will also create an event. The latter (event.type == pygame.QUIT) enables us to end the while-loop and the Python script without crashing. A list of key-constants can be found in the Pygame documentation.

The second method is called pygame.key.get_pressed() and will return a Boolean typed tuple with each entry representing a key on the keyboard. The value will be 0 for non-pressed keys and 1 for pressed keys. To evaluate key states, we need to know which keys map to which index of the tuple. For example, the upwards arrow key is at index 273.

Kinematics

Next, we will implement the kinematics of the rocket. While we use a simple approach for the rotation, the translational movement will have inertia. Mathematically, the rocket’s trajectory is the solution of the Equation of Motion and is smooth. The position cannot jump, but needs to change continuously instead.

Since we are fine with an approximate solution, we can calculate the trajectory using a time discretization with the Euler Forward Method. A simple and minimal 2-d implementation is shown in the following code:

Now we incorporate everything (gamelogic, input and rendering) into our previously defined CustomEnv class. We will also move everything related to Pygame into the render() and a separate init() function. This way, we can execute Machine Learning routines with step() and reset() without loading the heavier Pygame package. If the environment is loaded for the AI-training, the rendering is not needed and the performance can be increased.

Here is the above code running with some keyboard inputs:

Level Design

For now, we will be using a manually created static level. Creating one can be a tedious task. We will use Matplotlib to make our live a little bit easier. With the plt.ginput() function, coordinates can be captured by clicking inside the figure.

These coordinates will be printed into the console, from where you can copy them into your code. A little reformatting should make it possible to include them into our environment, for example by storing them in numpy array as seen in rocket_gym.py.

Collision Detection

Let’s assume we have the level boundary stored as an array of the size n*4, each line holding the points of one segment:

If two lines intersect can be checked by the following function. If the lines intersect, coordinates of the intersecting point are returned. If there is no intersection, None is returned.

Now, we can apply the formula to our problem. We do this by checking, if any of the environment boundaries intersect with the movement vector

Continue in Part 2

In the second part, we will discuss and implement the observations and rewards that are returned by the environment. After that, the actual training is conducted. Have a read over here: