My Journey to Reinforcement Learning — Part 0: Introduction

Jae Duk Seo
Towards Data Science
5 min readApr 6, 2018

--

Gif from website

Currently, I know very little about Reinforcement Learning, and I want to change that, so here is my first step on learning reinforcement learning. And as a first step, I wish to cover high level overviews first.

Please note that this post is for my future self and my learning process might be slow or different from yours.

Reinforcement Learning Tutorial by Peter Bodík, UC Berkeley

From this lecture, I learned that Reinforcement learning is more general compared to supervised or unsupervised. However, there seems to be still a notion of a goal, hence I assume there is going to be a certain cost function to measure how close are we from achieving that goal. Below is a very good summary of what reinforcement learning might be.

Right Image → Optimal Solution (No Reward For each step)
Middle Image → Solution when reward for each step is -0.1
Left Image → Solution when reward for each step is 0.01

Above, images are a perfect example (for me) that shows how complex reinforcement learning can be. If we make a robot that it’s objective is to get the most point, the optimal solution would be most right image. However, depending on the policy (this time, reward for each step) the solution that the robot learns are drastically different. From here the ppt explains quite a lot of math, so I wouldn’t included as a high level overview, but the ppt had a very good summary page.

From above, I learned that when reinforcement learning is used and what is the most challenging part of the reinforcement learning is actually designing the features, states and rewards.

International Conference on Machine Learning (ICML 2007) Tutorial

From previous presentation, we already learned that the challenging part is designing the State, and Reward. And those acronym at the bottom stands for Markov decision process and Partially observable Markov decision process.

A perfect example of states, action and rewards are shown above. We can see that this setting can easily applied to any games. (chess, start craft or even real world settings.)

Different types of learning algorithms, from here I learned that there are different sets of algorithms similar to set of classification algorithms, SVM, NN, or k-nearest neighbors.

Simple Beginner’s guide to Reinforcement Learning & its implementation (analyticsvidhya)

Image from this website

When we google reinforcement learning, we can see images like above, over and over again. So rather than seeing an agent or environment, lets actually think about this as a process where a baby is learning how to walk.

Image from this website

The “problem statement” of the example is to walk, where the child is an agent trying to manipulate the environment (which is the surface on which it walks) by taking actions (viz walking) and he/she tries to go from one state (viz each step he/she takes) to another. The child gets a reward (let’s say chocolate) when he/she accomplishes a sub module of the task (viz taking couple of steps) and will not receive any chocolate (a.k.a negative reward) when he/she is not able to walk. This is a simplified description of a reinforcement learning problem.” — Faizan Shaikh

Image from this website

The author actually give a long explanation of how these algorithms differ, if you wish to view them please click here. But in a short one/two sentence.

Supervised vs RL : Both map the relation between input and output, but in RL there is an reward function to measure the action that the agent took additionally a cost function to measure if we met the final goal. (e.g Winning a chess game → Winning the game is important, but there are multiple ways to win a chess game)
Unsupervised vs RL : Unsupervised learning is (mostly) finding patterns in underlying data and clustering them.

Final Words

There is one more post, “Introduction to Various Reinforcement Learning Algorithms. Part I (Q-Learning, SARSA, DQN, DDPG)” is an excellent article to learn about different types of learning algorithms. Overall, there are millions of resources on the internet, so anyone who wishes to learn RL won’t have a had time finding resources.

Reference

  1. (2018). Cs.uwaterloo.ca. Retrieved 6 April 2018, from https://cs.uwaterloo.ca/~ppoupart/ICML-07-tutorial-slides/icml07-brl-tutorial-part2-intro-ghavamzadeh.pdf
  2. People.eecs.berkeley.edu, 2018. [Online]. Available: https://people.eecs.berkeley.edu/~jordan/MLShortCourse/reinforcement-learning.ppt. [Accessed: 06- Apr- 2018].
  3. Partially observable Markov decision process. (2018). En.wikipedia.org. Retrieved 6 April 2018, from https://en.wikipedia.org/wiki/Partially_observable_Markov_decision_process
  4. Markov decision process. (2018). En.wikipedia.org. Retrieved 6 April 2018, from https://en.wikipedia.org/wiki/Markov_decision_process
  5. Introduction to Various Reinforcement Learning Algorithms. Part I (Q-Learning, SARSA, DQN, DDPG). (2018). Towards Data Science. Retrieved 6 April 2018, from https://towardsdatascience.com/introduction-to-various-reinforcement-learning-algorithms-i-q-learning-sarsa-dqn-ddpg-72a5e0cb6287

--

--

Exploring the intersection of AI, deep learning, and art. Passionate about pushing the boundaries of multi-media production and beyond. #AIArt