A Minimal Working Example for Discrete Policy Gradients in TensorFlow 2.0
A multi-armed bandit example for training discrete actor networks. With the aid of the GradientTape functionality, the actor network can be trained using only a few lines of code.
Published in
6 min readSep 4, 2020