Reinforcement Learning Demystified: Solving MDPs with Dynamic Programming

Episode 4, demystifying dynamic programming, policy evaluation, policy iteration, and value iteration with code examples.

Mohammad Ashraf
May 18, 2018

In the previous two episodes, I illustrated the key concepts and ideas behind MDPs, and how they are used to model an environment in the reinforcement learning problem. In this episode, I’ll cover how to solve an MDP with code examples, and that will allow us to do prediction, and control in any given MDP.

Brace yourself, this blog post is a bit longer than any of the previous ones, so grab your coffee and just dive in.

To continue reading this article, just follow this link to my new website “becomesentient.com” where I discuss all AI related topics. Thank you for your consideration.

--

--

Mohammad Ashraf

An AI research Engineer. Geek about AI and Reinforcement Learning. twitter: @MhmdElsersy, Github: Neo-47