Policy Gradients In Reinforcement Learning Explained

Learn all about policy gradient algorithms based on likelihood ratios (REINFORCE): the intuition, the derivation, the ‘log trick’, and update rules for Gaussian and softmax policies.

Wouter van Heeswijk, PhD
Towards Data Science
15 min readApr 9, 2022

--

Photo by Scott Webb on Unsplash

--

--

Assistant professor in Financial Engineering and Operations Research. Writing about reinforcement learning, optimization problems, and data science.