Policy Gradients In Reinforcement Learning Explained
Learn all about policy gradient algorithms based on likelihood ratios (REINFORCE): the intuition, the derivation, the ‘log trick’, and update rules for Gaussian and softmax policies.
Published in
15 min readApr 9, 2022