Action Masking with RLlib
Parametric actions to improve reinforcement learning
Published in
8 min readAug 25, 2020
RL algorithms learn via trial and error. The agent searches the state space early on and takes random actions to learn what leads to a good reward. Pretty straightforward.
Unfortunately, this isn’t terribly efficient, especially if we already know something about what makes a good vs. bad action in some states. Thankfully, we can use action masking — a simple technique that sets the probability of bad actions to 0 — to speed learning and improve our policies.