The world’s leading publication for data science, AI, and ML professionals.

So, You Want to Build a Reinforcement Learning Library

A Guide for Coding RL Algorithms

Photo by Avel Chuklanov on Unsplash
Photo by Avel Chuklanov on Unsplash

I recently decided to build my own Reinforcement Learning library with the goal of rapid idea prototyping for my PhD. If you’re curious, here’s a link, although it’s definitely not entirely finished yet 😅

Having gone through the torture myself now, I feel inspired to warn against some of the mistakes I made, and hopefully help prevent some late nights debugging complicated PyTorch code!

1. Start simple!

This is BY FAR the most important piece of advice I can give. When I first learnt about reinforcement learning, I got stuck in a quest to find the most up-to-date, advanced algorithm that I could use to finally master every video ever. This led me to learning about Phasic Policy Gradients (PPG) and I was determined to start here. Naturally, when it didn’t work, I realized the algorithm had so many different moving parts that it was almost impossible to narrow down which bit exactly was going wrong.

In the RL world, many algorithms are built on top of existing work (e.g. A2C → PPO → PPG), and in the same way you should implement the simplest algorithms first and build up to the more complicated ones so there is less to debug in each stage. For example, in the end, I decided to scrap PPG and just go for a simple Deep Q Network (DQN) algorithm. This was far easier to test with only the critic networks to really worry about, and now I can be more confident that the components that make up the DQN algorithm work as intended. Now, when I build a more complicated algorithm with overlapping components with DQN, I know I won’t need to revisit them if something doesn’t work.

2. First learn the theory, then try the implementation

In general, RL algorithms can be more complicated than standard supervised or unsupervised methods with intricate loss functions and a great deal of hyperparameter tuning. This means that they tend not to work exactly out of the box with every environment, and without the theoretical knowledge of how the algorithm works, you may decide that you must have simply coded something wrong when in fact the issue could be a miscalculation of hyperparameters or just the use of the wrong algorithm for that specific environment. A good understanding of the theory allows for faster diagnostics and an intuition of where to look for mistakes.

3. Don’t be afraid to use quick scripts initially

There are many principles and guidelines for coding good software. Unfortunately, they often take a lot of thought and time to implement properly. The most important thing when it comes to coding an RL algorithm is that it works as intended, if it doesn’t work then it doesn’t matter how good the code is, it’s unusable! A good strategy is to first ensure the underlying functionality is sound and then worry about engineering it appropriately. Jupyter notebooks are a good tool for this use case, an algorithm can be quickly designed and tested in a notebook before taking that code and formalizing it in Python modules. Overall, this approach should save you time as well as boost confidence in the codebase.


So there you have it, three tips to save you time and stress when implementing your own RL algorithms.

If you found this article useful, do consider:


Related Articles