Action Masking with RLlib

Parametric actions to improve reinforcement learning

Christian Hubbs
Towards Data Science
8 min readAug 25, 2020

--

RL algorithms learn via trial and error. The agent searches the state space early on and takes random actions to learn what leads to a good reward. Pretty straightforward.

Unfortunately, this isn’t terribly efficient, especially if we already know something about what makes a good vs. bad action in some states. Thankfully, we can use action masking — a simple technique that sets the probability of bad actions to 0 — to speed learning and improve our policies.

--

--