CURL outperforms prior pixel-based methods, both model-based and model-free, on complex tasks in the DeepMind Control Suite and Atari Games showing 1.9x and 1.2x performance gains at the 100K environment and interaction steps benchmarks respectively. CURL is the first model to show substantial data-efficiency gains from using a contrastive self-supervised learning objective for model-free RL agents across a multitude of pixel based continuous and discrete control tasks in DMControl and Atari.
By Aravind Srinivas, Michael Laskin, and Pieter Abbeel at CURL [1]
UC Berkeley researchers have made impressive improvements in sample efficiency for reinforced learning in their latest paper CURL. Contrastive losses are used for learning representations from high dimensional data. [2]
I have always been a fan of unsupervised learning and its potential to boost other types of Machine Learning (this is kind of based on my dissertation). Also, I am going to be demonstrating the power of using unsupervised learning along with Multi-task learning.
Contrastive learning is about distinguishing examples from one another rather than pushing representations of cat images towards one zero class label vectors the loss encourages the representation to be similar to a crop of the same cat image and as dissimilar as possible from the other images in the data set.
I was a bit skeptical about Contrastive learning at the beginning, but now seeing it being used in many great papers such as this one, CLIP, MoCo, SimCLR, and many others, I am starting to realize its true potential.
What problem is CURL trying to solve?
Reinforcement learning is becoming limited by the amount of data that is present in simulated worlds. Even with the tremendous improvement in GPU accelerators and data storage facilities, input data processing is likely to be a bottleneck for reinforcement learning.

CURL attempts to improve the same efficiency of reinforcement learning techniques that operate in extremely high dimensional spaces. This allows reinforcement learning methods to simulate a more realistic world. Moreover, CURL was designed to be generic enough such that it can be plugged into any RL algorithm that is based on learning representations from high dimensional images
How does CURL work?
Our work falls into the first class of models, which use auxiliary tasks to improve sample efficiency. Our hypothesis is simple: If an agent learns a useful semantic representation from high dimensional observations, control algorithms built on top of those representations should be significantly more data-efficient
By Aravind Srinivas, Michael Laskin, and Pieter Abbeel at CURL [1]
Auxiliary tasks are additional tasks that are learned simultaneously with the main RL goal and that generate a more consistent learning signal.

The pipeline starts with a "replay buffer" that produces a batch of transition. A good example of a replay buffer would be a video that is being used to train an RL algorithm to play a game. For each transition, a data augmentation algorithm is applied to produce a "key" and a "query". Intuitively, you can think of it as a dictionary look-up problem (this is the basis for contrastive learning).
After that, both the key and the query are encoded through 2 separate encoders. The 2 encoders feed their results to an unsupervised algorithm while only the key encoder feeds them to the reinforcement learning algorithm. What an unusual machine learning pipeline!
If you aren’t familiar with this technique (where 2 different approaches are being trained in parallel), it’s called multi-task learning. CURL is using multi-task learning to further structure the loss function and learn the mapping from the high dimensional stack of image frames into a lower-dimensional representation

Designing multi-task learning systems isn’t trivial (this comes from experience). Usually, when you’re designing these multi-task learning systems you have to take into account that different gradients can pull the weights into all these different directions. The magnitudes can be really different and you might have to scale them with some extra hyperparameters.
But one interesting characteristic of CURL is that the authors don’t mention having to do extensive different high parameter search or weightings of the gradients to make this system work
Moreover, CURL modifies the RL algorithm by adding a new loss function that utilizes contrastive learning. This loss function uses the notion of a query and a key.
To specify this new loss function there has to be [1]:
- A discrimination objective (think of this as the anchor used for comparing the contrastive samples)
- The transformation for generating query-key observations
- The embedding procedure for transforming observations into queries and keys
- The inner product is used as a similarity measure between the query-key pairs in the contrastive loss.
By Aravind Srinivas, Michael Laskin, and Pieter Abbeel at CURL. [1]
Final Thoughts:
CURL is the state-of-the-art image-based RL algorithm on the majority (5 out of 6) DMControl environments that we benchmark on for sample-efficiency against existing pixel-based baselines. On DMControl100k, CURL achieves 1.9x higher median performance than Dreamer (Hafner et al., 2019), a leading model-based method, and is 4.5x more data-efficient.
By Aravind Srinivas, Michael Laskin, and Pieter Abbeel at CURL. [1]
I am sure CURL has some limitations, but it’s always great to see unique approaches being used together (such as contrastive learning and multi-task learning) to achieve a new state-of-the-art benchmark.
I hope the explanations were clear and not too technical if you are interested in more details, I suggest checking out the original paper [here](https://github.com/MishaLaskin/curl) or check out the code here. Also, thanks to Henry AI labs for providing great explanations that helped me a lot in writing this article.
If you want to receive regular paper reviews about the latest papers in AI & Machine learning, add your email here & Subscribe!
https://artisanal-motivator-8249.ck.page/5524b8f934
References:
[1] CURL: Contrastive Unsupervised Representations for Reinforcement Learning. Aravind Srinivas and Michael Laskin and Pieter Abbeel. 2020.
[2] CURL: Contrastive Unsupervised Representations for Reinforcement Learning. Henry AI labs