The world’s leading publication for data science, AI, and ML professionals.

Taking reinforcement learning algorithms to real world robotics - background, needs, challenges…

Proving its success in gaming, commercial ML, and robotics, RL has morphed into the swiss army knife of AI. This article is an exploration…

Photo by Possessed Photography on Unsplash
Photo by Possessed Photography on Unsplash

Taking Reinforcement Learning Algorithms to Real World Robotics

Background, needs, challenges, and outlook

Proving its success in gaming, commercial ML, and robotics, RL has morphed into the Swiss army knife of AI. This article is an exploration of what can RL currently do, why we need RL for robotics, what challenges and future work would look like.

Background

Reinforcement Learning (RL) refers to a paradigm of algorithms where learning happens by trial and error. The RL agent learns in a reward-based system. The agent takes action and gets rewarded for success or punished for failure. Thus the agent successfully learns to perform a task by maximizing the reward. This is similar to how learning works for humans and animals (does Pavlo’s dog ring any bells?).

RL caught our attention when AlphaGo, a computer program, defeated the world-famous professional Go player Lee Sedol. This was special because other computer programs at the time could hardly play at an amateur level or beat professionals only with a handicap. AlphaGo was provided with amateur game data to develop a good understanding of the game to successfully devise a strategy to beat a professional player. A few years later RL surprised us even more when AlphaZero, yet again a computer program, self-taught itself to play chess, shogi, and Go and beat the world-champion computer program in each case.

Commercially, RL works better than some existing ML frameworks – for example, YouTube is pushing for the use of RL for recommendations. Netflix uses RL to show more relevant, appealing, and personalized artwork that keeps users glued to the screen. Facebook uses RL for personalized push notifications [1]. Even the kinds of ads you see are decided by an RL algorithm that maximizes the click-through rate (CRT) [2, 3]. RL has been used in stock trading, wherein an RL agent decides whether to hold, sell or buy stocks. IBM has one of the most sophisticated and successful RL-based platforms for trading.

Now let’s talk about real-world Robotics – why we need RL there and how it is performing.

Need

Our manufacturing process has become efficient and profitable with ubiquitous robots. Robots have the ability to perform mundane, repetitive tasks in an organized environment with high accuracy and precision. However, these manufacturing processes are not robust to the changes in their environments. It may seem counterintuitive, but even highly automated process lines can take days of expert labor to adapt to new product specifications.

Moreover, just coding for tasks and planning machine movements for assembly lines is a long and tedious process. Solutions that work in the industry are often designed for a specific task. For the simple task of picking up an ice-cream packet from a conveyor belt, even the height and width of the packet is coded in the algorithm. A slight change in the product specification would require a change to the algorithm— which could require to change the processing speed of the line, motion parameters of the robotic arm, grasping force or pressure of the robot hand, and so on.

If robots could have so many problems with a slight deviation from process parameters, can you imagine them in unstructured environments at our homes, offices, or on the streets? These challenges are also why we do not have robots working at our house and driving our cars on the street. Until recently, with Amazon warehouse robots, Roombas, and Autonomous car companies testing their algorithms, we could not imagine robots working freely outside the fixed assembly line structure.

Wouldn’t it be nice if the robots could learn to deal with these challenges by learning from experiences, i.e., using RL methods? In that case, we can leave the robot in the real world, where it tries things and learns to perform tasks using these learned experiences. The robot can learn to do the manufacturing task more robustly and learn new tasks like driving or grasping all kinds of objects that we do not know how to program accurately. After all, RL has already shown promise in other intelligent domains. The strength of RL is finding a solution in the absence of a system model. RL could find a solution when we do not know how to perform a task but know what the completed task looks like.

Challenges

In spite of the utility of RL in real-world robotics and RL’s proven capabilities, the use of RL is quite challenging for real-world robotic tasks. One of the well-known examples of RL in robotics applications is OpenAI’s hand solving a rubrics cube. A great deal of work, sophisticated simulation software and computational power went into making this project a success. Although considered a success, OpenAI’s hand can solve complex Rubik’s cube problems with only a 20% success rate [4].

Let’s talk about the challenges in using RL on real-world robots and what makes it difficult to achieve reliable results. Most of these are taken from the paper [5] and my own experiences:

Sample efficiency & Safe learning

Although they learn by trial and error, RL algorithms can take a long time to do so. In the real world, keeping the robot on in real-time until learning takes place can cost an incredible amount of time. Moreover, without human supervision, designing the best and safest way for robots to interact with the environment for a long time is challenging and potentially unsafe.

Reliable and stable learning

RL algorithms are notoriously difficult to train. Moreover, the learning is quite sensitive to the choice of hyperparameters and even what random seed is used to initialize and train the algorithm.

Use of simulation

The challenge of "Sample efficiency & Safe learning" could be solved by "simply" training the robot into simulation first and finetuning the algorithm in a real-world setting. Simulation can speed up the real-world time needed to learn. Unfortunately, it is not as simple. In the worst case, creating a simulation can take as much or more time than training the robot in a real-world setting. Current simulations work great visually, but when it comes to calculating real forces of interaction amongst robots and objects, they still have a long way to go.

Huge exploration space

When learning by trial and error, the exploration space is huge. For example, to learn to flip a pancake, a robot has to explore infinite action space – going up, down, right, left, grasping at any time. Discovering an exact sequence for reaching the pan, grasping it, and then moving it such that the pancake flips is more complex than it looks. Although learning from humans can be one of the ways to overcome this challenge, it might not always be possible to do so.

Generalization

Learning a skill in one setting and using a part of that skill in other settings can hugely reduce the robot’s training time and exploration space. Achieving such generalisation for environment, skill, or task is still an unsolved task.

The ever changing world and changing robot parameters

The real world is highly complex, and we do not yet have sensors to measure every robot-world interaction accurately. Plus, it would always be different from experimental test settings [6]. If RL algorithms don’t receive all the information, it is unfair to expect success at all times. Moreover, a robot’s internal control parameters might change over time, and they could vary from one robot to another of the same make. So training a single robot a single time and expecting all other robots to perform the same task (sensitive to these parameters) might not work!

Outlook

The ultimate goal of RL is to make learning as scalable and natural as to robots as it is to humans. Breakthroughs in the area can usher us into a new era where robots can learn any task on their own. Although RL is becoming more and more popular with a lot of research and funding, the ultimate goal seems far.

However, we can still unitize RL agents to solve seemingly challenging tasks in real-world robotics by combining them with already estabilished mathematical proofs – both for control theory and physics. Fast and usable results can be achieved with the right problem formulation and proving RL with what we already know about the world. I will soon write an article about how RL problems intended for end-to-end solutions are formulated and how to do reduce their complexity.

In my opinion, solving fundamental problems from start to end that we currently know solutions to, like controlling robot arm movements from scratch, is not where RL shines its best. Defining problems so that RL needs to learn only what we can not model or formulize would be one of the most efficient ways to incorporate RL into real-world robotics faster – until the research and algorithms in the area catch up.

References

[1] Enterprise Applications of Reinforcement Learning: Recommenders and Simulation Modeling, Ben Lorica, 2020.

[2] How reinforcement learning chooses the ads you see, Ben Dickson, 2021.

[3] Bayesian Bandits: Behind the scenes of Facebook’s spend allocation decisioning, Eric Benjamin Seufert, 2020.

[4] Solving Rubik’s Cube with a Robot Hand, OpenAI blog, 2019.

[5] How to train your robot with deep reinforcement learning: lessons we have learned. Ibarz J et al., The International Journal of Robotics Research. 2021;40(4–5):698–721.

[6] The Importance of A/B Testing in Robotics, Google AI blog, 2021.


Related Articles