Reinforcement Learning with AWS DeepRacer

From a toy car to AlphaGo and autonomous Teslas

Published in

Towards Data Science

6 min readOct 30, 2019

In March 2016, Lee Sedol, the greatest Go player of the past decade, was defeated 4–1 by AlphaGo. Computers have beaten the best humans at chess before, but Go is at another next level at complexity. Do you know what’s even crazier? The machine had only been learning how to play for the past 8 hours before the match.

This January 2019, the Google DeepMind team took AlphaGo to another level, making AlphaStar. Starcraft is one of the most complex video games out there and it’s never been mastered by a computer 🤖 before, until now. AlphaStar took out two world-class players, beating them both 5–0.

Just last month, OpenAI (formerly Elon Musk’s) released Multi-Agent Hide and Seek, a simulation between two sides, seekers and hiders. Hiders must avoid the line of sight 👁 of the seekers in a world with walls, boxes, and ramps. A whole plethora of strategies emerged, including shelter building and box surfing! Check out the video below to learn more 👇

OpenAI’s video on their Multi-Agent Hide and Seek Simulation

Last but not least, just 2 weeks ago, OpenAI released yet another amazing result, having trained a successful Robot Hand Rubix’s Cube Solver. Not only is it able to solve different-sized 3x3 Rubix’s Cubes with a single hand 60% of the time, but it can also do so with interference (like being pushed around by a plush giraffe 🦒 ).

What do all these amazing breakthroughs have in common? They all use reinforcement learning (RL), a currently super-hot 🔥 field of machine learning. At a high-level, RL involves an actor in a certain environment learning what to do through trial and error. It’s employed by robots, self-driving cars, and even stock market predictors to drive safely and give accurate predictions!

Speaking of self-driving cars, Amazon Web Services (AWS) recently created a challenge called AWS DeepRacer 🚗 , with the primary purpose of teaching developers about RL. By training a car to drive around a track, competitors race to achieve the fastest time. I decided to take part in their challenge along with the corresponding Udacity course to gain a deeper understanding of RL and I’d like to share them with you! 😃

A More In-Depth Look at RL

In essence, reinforcement learning is modelled after the real world, in evolution, and how people 👦and animals 🐶learn. Through experience, we humans learn what to do and what not to do in different scenarios and apply them to our own lives. RL works the same way through simulating experiences to help machines learn.

Let’s frame the problem by describing some key terms, using AWS DeepRacer as an example:

Agent/Actor: the entity performing actions in the environment (AWS DeepRacer and the models that control it)
Actions: what the agent can do (the car can turn, accelerate, etc)
Environment: where the agent exists (the track)
State: the key characteristics in a given time (location on the track)
Reward: the feedback given to the agent depending on its action in the previous state (a high reward for doing well, a low reward for doing poorly)

So here’s how a basic RL model learns: an agent in an environment performs an action in its given state. The environment gives the agent a reward 💸depending on the quality of its action. The agent then performs another action and the cycle continues. A training episode consists of the entirety of this cycle for a certain amount of actions.

Let’s delve a bit deeper into rewards!

What’s in a Reward?

The reward function is a very important part of an RL model. Having one that incentivizes optimal actions and disincentivizes poor actions is critical to have a well-trained agent.

In AWS DeepRacer, we create our reward function 💰 with input parameters. These comprise of the position of the car, the distance from the center, and more. Here’s an example of a basic reward function:

In this function, we use the input parameter params[‘track_width’] and params[‘distance_from_center’]. We create three markers and compare the car’s distance from the centre line with these markers. If the car is within 10% from the centre, it’s given a reward of 1 😊. If it’s within 25%, it’s given a reward of 0.5. If it’s on the track, it gets 0.1. And if it’s off the track, we give it 0.001, which amounts to basically nothing 😢

This reward function will give the car a high reward if it sticks to the middle of the track. Knowing this, it will avoid falling off the track, which is what we want! While this reward function doesn’t optimize for speed, at least we can rest assured that we’ll probably finish the race. 😅

Tuning Hyperparameters

Let's discuss hyperparameters: the settings of the model we can tweak or “tune”. By tuning hyperparameters before training, we can attempt to improve our model’s performance 💯. Here are examples of some hyperparameters we can tune:

Learning Rate: how quickly the model learns. If the learning rate is high, it learns quickly but may not be able to reach its optimal state. If it’s too low, it can take a long time to train.
Discount Factor: how much the model takes into account the future. If it’s higher, the more DeepRacer will value the future and give higher rewards for actions that take the future into account. If it’s low, DeepRacer will think in a short-term manner.
Entropy: the degree of uncertainty. A higher value allows the model to explore the environment more thoroughly, while a lower value allows the model to exploit what it already knows.
And more!

There are a ton of ways to tune the hyperparameters and there is no correct way to do this, just from experimentation 🔬🧪and trial and error.

Training and Evaluating our Model

So now that you know all about reinforcement learning for AWS DeepRacer, let’s train and evaluate our virtual model! Here are the steps (note that you can follow along here by creating an AWS account):

Create a model in the AWS DeepRacer console.
Set the name, description, and track.
Set your action space (possible actions for the model to take).
Create your reward functions.
Set your hyperparameters.
Start training!

During training, your model will be trained with the virtual track for the time ⌚ you specified for, transitioning from state to state, receiving rewards along the way.

The training dashboard for AWS DeepRacer, including training stats and a video stream; image by author

After training, evaluate the model on any track and see how it does. If it performs well, submit it to the leaderboard 🥇! If not, try changing your reward function and hyperparameters to see how you can improve. It’s all an iterative process, after all 😁

What can we learn from AWS DeepRacer?

I hope you had fun training your model and racing it on the track! Though this is a much simpler model compared to what we’ve seen with OpenAI or Google’s DeepMind, we can still draw some valuable takeaways 🔑

Training a good model takes lots of time, resources, and experimentation. Think of how much time DeepMind spent training AlphaStar! It’s truly an iterative process.
Setting an optimal reward function is crucial. Without a good reward function, the model won’t learn its task properly. If a car isn’t incentivized to stay on the track, you can count on it crashing pretty quickly.
Reinforcement Learning is making waves. AWS has made it super easy for anyone with the resources to have a really good RL model. We’ve seen the crazy advances in the past few months; imagine what we’ll see in the future!

There’s super interesting work being done in the field of RL and you should be excited! Before you know it, you won’t be driving your own car anymore 😏

Hope you enjoyed reading about RL! I’ll be doing a deeper dive in NLP soon too, so be sure to follow me and connect on LinkedIn! If you want to contact me, shoot me a message at albertlai631@outlook.com. Would love to talk :)

Have a good one and stay tuned for more! 👋