Introduction to Experience Replay for Off-Policy Deep Reinforcement Learning

Exploring a crucial mechanism for improving the sample efficiency of deep reinforcement learning agents

Ryan S
Towards Data Science

--

Robotics is a domain positioned to benefit greatly from deep reinforcement learning. Photo by Arseny Togulev on Unsplash

Motivation: Sample Efficiency in Deep Reinforcement Learning Agents

Training deep reinforcement learning agents requires significant trial-and-error for the agent to learn robust policies to accomplish one or more tasks in its environment. For these applications, the agent is typically only told whether its behavior results in a large or small reward; therefore, the agent must indirectly learn both the behavior and value of certain actions and states. As one can imagine, this typically requires the agent to experiment with its behavior and estimates for quite a while.

Generating these experiences can be difficult, time-consuming, and expensive, particularly for real-life applications such as humanoid robotics. Therefore, a question many roboticists and machine learning researchers have been considering is: “How can we minimize the number of experiences we need to generate in order to successfully train robust and high-performing agents?”

Just getting started with deep reinforcement learning? Check out this fantastic intro from

--

--

Responses (2)

What are your thoughts?