How Does PPO With Clipping Work?

Intuition + math + code, for practitioners

James Koh, PhD
Towards Data Science
9 min readOct 7, 2023

--

Photo by Tamanna Rumee on Unsplash

In Reinforcement Learning, Proximal Policy Optimization (PPO) is often cited as the example for a policy approach, as compared to DQN (a value-based approach) and the big family of actor-critic methods which includes TD3 and SAC.

I recalled some time ago, when I was learning it for the first time, I was left…

--

--