Deep Reinforcement Learning for Drones in 3D realistic environments

Published in

Towards Data Science

9 min readOct 23, 2019

A complete code to get you started with implementing Deep Reinforcement Learning in a realistically looking environment using Unreal Gaming Engine and Python.

Note 1: The Github repository DRLwithTL mentioned in the article has been outdated. Please use the following more detailed repository instead https://github.com/aqeelanwar/PEDRA
Note 2: A more detailed article on drone reinforcement learning can be found here

Overview:

Last week, I made a GitHub repository public that contains a stand-alone detailed python code implementing deep reinforcement learning on a drone in a 3D simulated environment using Unreal Gaming Engine. I decided to cover a detailed documentation in this article. The 3D environments are made on Epic Unreal Gaming engine, and Python is used to interface with the environments and carry out Deep reinforcement learning using TensorFlow.

Drone navigating in a 3D indoor environment.[4]

At the end of this article, you will have a working platform on your machine capable of implementing Deep Reinforcement Learning in a realistically looking environment for a Drone. You will be able to

Design your custom environments
Interface it with your Python code
Use/modify existing Python code for DRL

For this article, the underlying objective will be drone autonomous navigation. There are no start or end positions, rather the drone has to navigate as long as it can without colliding into obstacles. The code can be modified to any user-defined objective.

The complete simulation consists of three major parts

3D Simulation Platform — To create and run simulated environments
Interface Platform — To simulate drone physics and interface between Unreal and Python
DRL python code Platform — Contains the DRL code based on TensorFlow

There are multiple options to select each of these three platforms. But for this article, we will select the following

3D simulation Platform — Unreal Engine [1]
Interface Platform — AirSim [2]
DRL python code Platform — DRLwithTL GitHub repository [3]

The rest of the article will be divided into three steps

Step1 — Installing the platforms
Step2 — Running the python code
Step3 — Control/Modify the code parameters

Step 1 — Installing the three Platforms:

It’s advisable to make a new virtual environment for this project and install the dependencies. Following steps can be taken to download get started with these platforms.

Clone the repository: The repository containing the DRL code can be cloned using

git clone https://github.com/aqeelanwar/DRLwithTL.git

2. Download Imagenet weights for Alexnet: The DNN when initialized uses Imagenet learned weights for AlexNet instead of random weights. This given the DNN a better starting point for training and help in convergence.

The following link can be used to download the imagenet.npy file.

Download imagenet.npy

Once downloaded, create a folder ‘models’ in DRLwithTL root folder, and place the downloaded file there.

models/imagenet.py

2. Install required packages: The provided requirements.txt file can be used to install all the required packages. Use the following command

cd DRLwithTL
pip install -r requirements.txt

This will install the required packages in the activated python environment.

3. Install Epic Unreal Engine: You can follow the guidelines in the link below to install Unreal Engine on your platform

Instructions on installing Unreal engine

4. Install AirSim: AirSim is an open-source plugin for Unreal Engine developed by Microsoft for agents (drones and cars) with physically and visually realistic simulations. In order to interface between Python and the simulated environment, AirSim needs to be installed. It can be downloaded from the link below

Instructions on installing AirSim

Once everything is installed properly, we can move onto the next step of running the code.

Step 2 — Running DRLwithTL-Sim code:

Once you have the required packages and software downloaded and running, you can take the following steps to run the code

Create/Download a simulated environment

You can either manually create your environment using Unreal Engine or can download one of the sample environments from the link below and run it.

Download Environments

Following environments are available for download at the link above

Indoor Long Environment
Indoor Twist Environment
Indoor VanLeer Environment
Indoor Techno Environment
Indoor Pyramid Environment
Indoor FrogEyes Environment

The link above will help you download the packaged version of the environment for 64-bit windows. Run the executable file (.exe) to start the environment. If you are having trouble with running the environment, make sure your settings.json file in Documents/AirSim has been configured properly. You can try using the keys F, M, and backslash to change the camera view in the environment. Also, keys 1,2,3, and 0 can be used to look at FPV, segmentation map, depth map, and toggle sub-window views.

Edit the configuration file (Optional)

The RL parameters for the DRL simulation can be set using the provided config file and are explained in the last section.

cd DRLwithTL\configs
notepad config.cfg                (# for Windows)

Run the Python code

The DRL code can be started using the following command

cd DRLwithTL
python main.py

Running main.py carries out the following steps

Attempt to load the config file
Attempt to connect with the Unreal Engine (the indoor_long environment must be running for python to connect with the environment, otherwise connection refused warning will appear — The code won’t proceed unless a connection is established)
Attempt to create two instances of the DNN (Double DQN is being used) and initialize them with the selected weights.
Attempt to initialize Pygame screen for user interface
Start the DRL algorithm

At this point, the drone can be seen moving around in the environment collecting data-points. The block diagram below shows the DRL algorithm used.

Block diagram of DRL Training and associated segments

Viewing learning parameters using tensorboard

During simulation, RL parameters such as epsilon, learning rate, average Q values, loss, and return can be viewed on the tensorboard. The path of the tensorboard log files depends on the env_type, env_name, and train_type set in the config file and is given by

models/trained/<env_type>/<env_name>/Imagenet/   # Generic path
models/trained/Indoor/indoor_long/Imagenet/      # Example path

Once identified where the log files are stored, the following command can be used on the terminal to activate tensorboard.

cd models/trained/Indoor/indoor_long/Imagenet/
tensorboard --logdir <train_type>                # Generic
tensorboard --logdir e2e                         # Example

The terminal will display the local URL that can be opened up on any browser, and the tensorboard display will appear plotting the DRL parameters on run-time.

Run-time controls using PyGame screen

DRL is notorious to be data-hungry. For complex tasks such as drone autonomous navigation in a realistically looking environment using the front camera only, the simulation can take hours of training (typically from 8 to 12 hours on a GTX1080 GPU) before the DRL can converge. In the middle of the simulation, if you feel that you need to change a few DRL parameters, you can do that by using the PyGame screen that appears during your simulation. This can be done using the following steps

Change the config file to reflect the modifications (for example decrease the learning rate) and save it.
Select the Pygame screen, and hit ‘backspace’. This will pause the simulation.
Hit the ‘L’ key. This will load the updated parameters and will print it on the terminal.
Hit the ‘backspace’ key to resume the simulation.

Right now the simulation only updates the learning rate. Other variables can be updated too by editing the aux_function.py file for the module check_user_input at the following lines.

Editing check_user_input module to update other parameters too

cfg variable at line 187 has all the updated parameters, you only need to assign it to the corresponding variable and return the value for it to be activated.

Step3 — Control/Modify Parameters in DRLwithTL-Sim:

The code gives you the control to

Change the DRL configurations
Change the Deep Neural Network (DNN)
Modify the drone action space

1. Change the DRL configurations:

The provided config file can be used to set the DRL parameters before starting the simulation.

Config file used for DRL and sample values

num_actions: Number of actions in the action space. The code uses perception-based action space [4] by dividing the camera frame into grid of sqrt(num_actions)*sqrt(num_actions).
train_type: Determines the number of layers to be trained in the DNN. The supported values are e2e, last4, last3, last2. More values can be de
wait_before_train: This parameter is used to set up the iteration at which the training should begin. The simulation collects this many data-points before it starts the training phase.
max_iters: Determines the maximum number of iterations used for DRL. The simulation stops when these many iterations have been completed.
buffer_len: is used to set the size of the experience replay buffer. The simulation keeps on collecting the data points and starts storing them in the replay buffer. Data-points are sampled from this replay buffer and used for training.
batch_size: Determines the batch size in one training iteration.
epsilon_saturation: Epsilon greedy method is used to transition from exploration to exploitation phase. When the number of iterations approaches this value, epsilon approaches 0.9 i.e. 90% of actions are predicted through the DNN (exploitation) and only 10% are random (exploration)
crash_threshold: This value is used along with the depth map to determine when the drone is considered to be virtually crashed. When the average depth to the closest obstacle in the center dynamic window on the depth map falls below this value, a reward of -1 is assigned to the data-tuple.
Q_clip: If set to True, the Q values are clipped if beyond a certain value. Helps in the convergence of DRL.
train_interval: This value determines how often training happens. For example, if set to 3, training happens after every 3 iterations.
update_target_interval: The simulation uses a Double DQN approach to help to converge DRL loss. update_target_interval determines how often the simulation shifts between the two Q-networks.
dropout_rate: Determines how often connections will be dropped out to avoid over-fitting.
switch_env_steps: Determines how often the drone changes its initial positions. These initial positions are set in environments/initial_positions.py under the corresponding environment name.
epsilon_model: linear or exponential

2. Change the DNN topology:

The DNN used for mapping the Q values to their states can be modified in the following python file.

network/network.py       #Location of DNN

Different DNN topologies can be defined in this python file as a class. The code comes with three different versions of the modified AlexNet network. More networks can be defined according to the user needs if required. Once a new network is defined, netowork/agent.py file can be modified to use the required network on line 30 as shown below

3. Modify the drone Action Space:

The current version of the code supports perception-based action space. Changing the num_actions parameter in the config file changes the number of bins the front-facing camera is divided into.

Perception-based action space — Default action space used in the DRLwithTL code [4]

If an entirely different type of action space needs to be used, the user can define it by modifying the following module

Module:    take_action
Location:  network/agent.py

If modified, this module should be able to map the action number (say 0,1, 2, …, num_actions) to a corresponding yaw and pitch value of the drone.

Summary:

This article was aimed at getting you started with a working platform for a Deep reinforcement learning platform on a realistic 3D environment. The article also mentions the parts of codes that can be modified according to user needs. The complete code in working can be seen in paper [4].

References:

If this article was helpful to you, feel free to clap, share and respond to it. If want to learn more about Machine Learning and Data Science, follow me @Aqeel Anwar or connect with me on LinkedIn.