The world’s leading publication for data science, AI, and ML professionals.

Robots that Reason

Inorganic knowledge traditions with model-based reinforcement learning

Inorganic knowledge traditions with model-based reinforcement learning

This essay explores the concept of inorganic knowledge traditions capable of sequential improvement using model based reinforcement learning

Many behavioral economists presently believe that there are two primary methods used by humans for strategic decision making. One is fast, intuitive and unconscious – what has been called System 1 thinking. The other is slow, logical, step wise and requires effortful thought, referred to as System 2 thinking.

A growing body of evidence now supports the claim that model-free reinforcement learning of the type popularized by DeepMind et. all corresponds functionally to System 1 thinking from Kahneman and Tversky’s dual process theory. To my mind, this is a development that has not received its due. Consider that System 1 thinking likely underpins almost every form of advanced animal cognition to date. That a variety of RL algorithms can now functionally replicate System 1 thinking is likely to engender a renaissance in robotic control mechanisms, at the very least. If one could train a home robot like one train’s a dog, what might one train it to do? Probably some pretty nifty things, a variety of weird stuff, and hey, lets be honest, some downright evil things! This is humans doing the training after all, not angels.

However, that prospect ignores perhaps an even more important development in the field of reinforcement learning, that of algorithms that can functionally replicate System 2 thinking from dual process theory – the slow deliberate model based reasoning that humans are especially good at. Without wishing to put too fine a point on it, the full flowering of System 2 algorithms could make some of the bolder claims about an approaching Singularity appear none too far-fetched. That said the purpose of this article is not to convince skeptics of the Singularity, but rather to point out some recent developments in model based reinforcement learning and explore their implications.

In regards to intuitive System 1 decision making, as we experience, those experiences modify our future decision making. This ability to change and adapt intuitive decisions over time has up until very recently been the exclusive domain of biological intelligence. It is also what underlies something we could think of as "general intelligence." This is a hotly contended topic, with some arguing that there is no such thing as "general intelligence", only many kinds of specialized intelligence. Intuitively, many of us are likely to feel otherwise, recognizing a common thread that underlies both our ability to learn how to ride a bicycle and conquer an Atari video game. One likely candidate for that common thread is model free reinforcement learning and a large body of evidence exists in support of that position.

One of the problematic outcomes arising from endowing computers with model free reinforcement learning is that there is no way for the agent to know why its strategy worked and no way to communicate or reason about it. Like evolution, the hand of model free reinforcement learning is a blind one. It can optimize without being aware what it did that led to the desirable result. It is an unconscious process in that agent is unable to reason abstractly about how particular elements in the environment combined with its actions to achieve the desired outcome. Just like the dog that can successfully manipulate its owner without ever understanding the mechanism for why, when it sits on a pillow and makes goofy expressions, treats come raining down upon it. The dog simply knows what worked well in the past, not the mechanism for why it works. As such, the path that generated those solutions is frequently hidden within the random variation the agent employed to reach its goal state.

This can make it troublesome to replicate or communicate a skill generated through model free reinforcement learning. In a fixed environment, with fixed rewards, one would expect multiple model free RL algorithms to eventually converge to a single goal state given sufficient training. This proof of convergence has been demonstrated for SARSA and Q Learning. However in a rich, real world environment, with a continuous action space, it might be difficult or impossible to replicate the exact combination of actions that eventually led to an agent reaching a desired goal state, like trying to roll a snowball down a hill and have it follow the exact same path that a previous snowball had rolled in – but with skiers and snowmobiles crossing those same tracks all the time.

Model based reinforcement learning, on the other hand, has the advantage of creating an abstraction of the way things interact in the environment as the agent learns about them. This can then be communicated to other agents or stored for later use in other contexts, accelerating the acquisition of related skills. As such, it seems likely that developments in model based reinforcement learning are likely to constitute the next wave of advancements in the field of Artificial Intelligence. Already DeepMind has made some impressive progress towards that goal with the introduction of algorithms for "Relational Deep Reinforcement Learning". These combine relational networks with reinforcement learning, allowing an agent to speculate about how objects in the environment interact with its actions to produce desirable results.

In many ways this shift to model based reinforcement learning recapitulates the evolution of acquired cultural strategies in humans. Consider modern agriculture – as Jared Diamond points out in his book "The Third Chimpanzee", this likely began as an entirely unconscious process occurring as humans gathered wild plants, in the process, unwittingly depositing seeds near their domestic enclosures. Some of these seeds would later have sprouted, giving rise to the first proto-gardens. Thus an unconscious model free strategy could gradually have given rise to a conscious model based one as humans took to weeding, harvesting and replanting those once wild plants. With Relational Deep Reinforcement Learning, it could be argued that computers are undergoing a similar "great leap forward", transitioning from model free reinforcement learning strategies to ones that use models to reason and conjecture about their environments.

The upshot is that human culture, the collective knowledge of our species that can be transferred and improved upon, may soon have a corollary within computer science and robotics. Relational networks developed by a deep RL agent can gradually be improved upon, and then broadcast out to other agents. This, in theory, allows multiple agents to slowly improve their collective understanding of a given environment in a similar way to the method in which human scientific knowledge grew over time, engendering among other things, the civilization in which we presently live.


Related Articles