How Does PPO With Clipping Work?
Intuition + math + code, for practitioners
Published in
9 min readOct 7, 2023
In Reinforcement Learning, Proximal Policy Optimization (PPO) is often cited as the example for a policy approach, as compared to DQN (a value-based approach) and the big family of actor-critic methods which includes TD3 and SAC.
I recalled some time ago, when I was learning it for the first time, I was left…