The world’s leading publication for data science, AI, and ML professionals.

Safe Artificial General Intelligence

AGI Safety Researchers 2018 Projects at Future of Life Institute

Photo by @the_las_memperor
Photo by @the_las_memperor

The Future of Life Institute (FLI) has appeared across various articles and areas within the field of Artificial Intelligence, at least where I have looked. They seem to be concerned with the unknown future and how it affects us. Since I have been exploring the topic of AI Safety it does now make sense seeing as FLI has funded a series of different projects throughout the last five years particularly with two rounds, both funded by Elon Musk and different research institutes. The first round seems to have been in 2015 with a focus on AI Safety Researchers and the second round with its focus on artificial general intelligence (AGI) Safety Researchers in 2018. Since the project summaries are all out online I decided to have a think about each in turn.

About The Future of Life Institute

FLI has a mission: To catalyze and support research and initiatives for safeguarding life and developing optimistic visions of the future, including positive ways for humanity to steer its own course considering new technologies and challenges. FLI is currently focusing on keeping artificial intelligence beneficial and they are also exploring ways of reducing risks from nuclear weapons and biotechnology. They are based in Boston and held a launch event at MIT in 2014. They organised a conference on the Future of AI in 2015. 5000 researchers signed on a letter to make AI robust. In 2015 Elon Musk announced that he would be supporting with $10 million, with the first $6 million set up at that time. 37 researchers were chosen and grant winners produced over 45 scientific publications. I may get to these publications from 2015, however first I will focus on the most recent round and its outline.

The FLI 2018 Funding Round Project Summaries and Thoughts on Technical Abstracts

In spring of 2018, FLI launched their second AI Safety Research program, this time focusing on Artificial General Intelligence (AGI. By the summer, 10 researchers were awarded over $2 million to tackle the technical and strategic questions related to preparing for AGI. I have reposted the technical abstracts and will give some thoughts or see if I find any terminology or directions worth exploring in brief. I apologise if you find this too basic, I do this as an exercise to better understand the proposals.

1. Allan Dafoe

"Technical Abstract: ​Artificial general intelligence (AGI) may be developed within this century. While this event could bring vast benefits, such as a dramatic acceleration of scientific and economic progress, it also poses important risks. A recent survey shows that the median AI researcher believes there is at least a one-in-twenty chance of a negative outcome as extreme as human extinction. Ensuring that AGI is developed safely and beneficially, and that the worst risks are avoided, will require institutions that do not yet exist.

Nevertheless, the need to design and understand these institutions has so far inspired very little academic work. Our programme aims to address several questions that are foundational to the problem of governing advanced AI systems. We will pursue four workstreams toward this aim, concerning the state of Chinese AI research and policy thought, evolving relationships between governments and AI research firms, the prospects for verifying agreements on AI use and development, and strategically relevant properties of AI systems that may guide states’ approaches to AI governance. Outputs of the programme will include academic publications, workshops, and consultations with leading actors in AI development and policy."

Main thread:

  • This is a research project to understand institutions developing AI looking at approaches to AI governance and international relationships.

It has four ‘workstreams’;

  1. Chinese AI research and policy thought
  2. Relationships between governments and AI research firms.
  3. Prospect for verifying agreements on AI use and development
  4. Strategically relevant properties of AI systems that may guide states’ approaches to AI governance

Allan Dafoe is the director, Center for the Governance of AI (GovAI) at the Future of Humanity Institute. He is an Associate Professor, International Politics of Artificial Intelligence. His background is in political science and economics. He has been a Visiting Researcher at the Department for Peace and Conflict Studies.

You can find more information about GovAI here: https://www.fhi.ox.ac.uk/GovAI/

If you want an introduction to the topic and a lecture from Allan Dafoe, you can check out this video on Youtube.

2. Stefano Ermon

"Technical Abstract: Reward specification, a key challenge in value alignment, is particularly difficult in environments with multiple agents, since the designer has to balance between individual gain and overall social utility. Instead of designing rewards by hand, we consider inverse reinforcement learning (IRL), an imitation learning technique where agents learn directly from human demonstrations. These techniques are well developed for the single agent case, and while they have limitations, they are often considered a key component for addressing the value alignment problem. Yet, multi-agent settings are relatively unexplored.

We propose to fill this gap and develop imitation learning and inverse reinforcement learning algorithms specifically designed for multi-agent settings. Our objectives are to: 1) develop techniques to imitate observed human behavior and interactions, 2) explicitly recover rewards that can explain complex strategic behaviors in multi-agent systems, enabling agents to reason about human behavior and safely co-exist, 3) develop interpretable techniques, and 4) deal with irrational agents to maximize safety. These methods will significantly improve our capabilities to understand and reason about the interactions among multiple agents in complex environments."

Main thread:

  • Balancing reward specification in complex multi-agent settings through imitation learning and inverse reinforcement learning algorithms.

Objectives.

  1. Develop techniques to imitate observed human behaviour and interactions
  2. Explicitly recover rewards that can explain complex strategic behaviors in multi-agent systems, enabling agents to reason about human behavior and safely co-exist
  3. Develop interpretable techniques
  4. Deal with irrational agents to maximize safety.

Stefano Ermon is an Assistant Professor, Department of Computer Science and Fellow, Woods Institute for the Environment at Stanford University. He is affiliated with the Artificial Intelligence Laboratory. He has an educational background in electrical engineering, with a PhD in Computer Science with a minor in Applied Mathematics. You can read one of his papers on arXiv here.

If you want to see a talk from Stefano check it out here. He talks about some of his thoughts on this topic that may be more up to date:

3. Owain Evans

"Technical Abstract: Our goal is to understand how Machine Learning can be used for AGI in a way that is ‘safely scalable’, i.e. becomes increasingly aligned with human interests as the ML components improve. Existing approaches to AGI (including RL and IRL) are arguably not safely scalable: the agent can become un-aligned once its cognitive resources exceed those of the human overseer. Christiano’s Iterated Distillation and Amplification (IDA) is a promising alternative. In IDA, the human and agent are ‘amplified’ into a resourceful (but slow) overseer by allowing the human to make calls to the previous iteration of the agent. By construction, this overseer is intended to always stay ahead of the agent being overseen.

Could IDA produce highly capable aligned agents given sufficiently advanced ML components? While we cannot directly get empirical evidence today, we can study it indirectly by running amplification with humans as stand-ins for AI. This corresponds to the study of ‘factored cognition’, the question of whether sophisticated reasoning can be broken down into many small and mostly independent sub-tasks. We will explore schemes for factored cognition empirically and exploit automation via ML to tackle larger tasks."

Main thread:

  • How can factored cognition be used in Iterated Distillation and Amplification (IDA) the human and agent are ‘amplified’ into a resourceful (but slow) overseer by allowing the human to make calls to the previous iteration of the agent.

You can read more about factored cognition here:

Factored Cognition

I think an image from this text may explain it more succinctly, however I may be wrong:

Owain Evans is Alexander Tamas Research Scientist in Artificial Intelligence at the University of Oxford. He is a research scientist working on AI Safety and Reinforcement Learning at the Future of Humanity Institute (directed by Nick Bostrom). His PhD is from MIT, where he worked on cognitive science, AI, and philosophy.

4. The Anh Han

"Technical Abstract: An AI race for technological advantage towards powerful AI systems could lead to serious negative consequences, especially when ethical and safety procedures are underestimated or even ignored. For all to enjoy the benefits provided by a safe, ethical and trustworthy AI, it is crucial to enact appropriate incentive strategies that ensure mutually beneficial, normative behaviour and safety-compliance from all parties involved. Using methods from Evolutionary Game Theory, this project will develop computational models (both analytic and simulated) that capture key factors of an AI race, revealing which strategic behaviours would likely emerge in different conditions and hypothetical scenarios of the race.

Moreover, applying methods from incentives and agreement modelling, we will systematically analyse how different types of incentives (namely, positive vs. negative, peer vs. institutional, and their combinations) influence safety-compliance behaviours over time, and how such behaviours should be configured to ensure desired global outcomes, without undue restrictions that would slow down development. The project will thus provide foundations on which incentives will stimulate such outcomes, and how they need to be employed and deployed, within incentive boundaries suited to types of players, in order to achieve high level of compliance in a cooperative safety agreement and avoid AI disasters."

Main thread:

  • Simulating and analysing the AI race building computational models with methods from Evolutionary Game Theory to look at different incentives that influence safety-compliance behaviour over time.

The Anh Han is currently a Senior Lecturer in Computer Science at School of Computing, Media and the Arts, Teeside University.

5. Jose Hernandez-Orallo

"Technical Abstract: Many paradigms exist, and more will be created, for developing and understanding AI. Under these paradigms, the key benefits and risks materialise very differently. One dimension pervading all these paradigms is the notion of generality, which plays a central role, and provides the middle letter, in AGI, artificial general intelligence. This project explores the safety issues of present and future AGI paradigms from the perspective of measures of generality, as a complementary dimension to performance.

_We investigate the following research questions:

  1. Should we define generality in terms of tasks, goals or dominance? How does generality relate to capability, to computational resources, and ultimately to risks?
  2. What are the safe trade-offs between general systems with limited capability or less general systems with higher capability? How is this related to the efficiency and risks of automation?
  3. Can we replace the monolithic notion of performance explosion with breadth growth? How can this help develop safe pathways for more powerful AGI systems?_

These questions are analysed for paradigms such as reinforcement learning, inverse reinforcement learning, adversarial settings (Turing learning), oracles, cognition as a service, learning by demonstration, control or traces, teaching scenarios, curriculum and transfer learning, naturalised induction, cognitive architectures, brain-inspired AI, among others."

Main thread:

  • Exploring perspectives of measures of generality, as a complementary dimension to performance: (1) definitions, (2) trade-offs, and (3) replacements.

José Hernández-Orallo is Professor of Information Systems and Computation at the Universitat Politècnica de València, Spain. He has published four books and more than a hundred articles and papers on artificial intelligence, machine learning, data mining, cognitive science, and information systems. His work in the area of machine intelligence evaluation has been covered by both scientific and popular outlets, including The Economist and New Scientist. He pioneered the application of algorithmic information theory to the development of artificial intelligence tests.

6. Marcus Hutter

"Technical Abstract: The agent framework, the expected utility principle, sequential decision theory, and the information-theoretic foundations of inductive reasoning and machine learning have already brought significant order into the previously heterogeneous scattered field of artificial intelligence (AI). Building on this, in the last decade I have developed the theory of Universal AI. It is the first and currently only mathematically rigorous top ‘down approach to formalize artificial general intelligence.

This project will drive forward the theory of Universal AI to address what might be the 21st century’s most significant existential risk: solving the Control Problem, the unique principal-agent problem that arises with the creation of an artificial superintelligent agent. The goal is to extend the existing theory to enable formal investigations into the Control Problem for generally intelligent agents. Our focus is on the most essential properties that the theory of Universal AI lacks, namely a theory of agents embedded in the real world: it does not model itself reliably, it is constraint to a single agent, it does not explore safely, and it is not well-understood how to specify goals that are aligned with human values."

Main thread:

  • Drive forward the theory of universal AI to solve the risk of the Control Problem that arises with a superintelligent agent through a theory of agents embedded in the real world.

Marcus Hutter is Professor in the Research School of Computer Science (RSCS) at the Australian National University (ANU) in Canberra. Before hewas with IDSIA in Switzerland and NICTA. His research at RSCS/ANU/NICTA/IDSIA is/was centered around Universal Artificial Intelligence, which is a mathematical top-down approach to AI, based on Kolmogorov complexity, algorithmic probability, universal Solomonoff induction, Occam’s razor, Levin search, sequential decision theory, dynamic programming, reinforcement learning, and rational agents.

7. James Miller

"Technical Abstract: Economists, having long labored to create mathematical tools that describe how hyper-rational people behave, might have devised an excellent means of modeling future computer superintelligences. This guide explains the uses, assumptions, and limitations of utility functions in the hope of becoming a valuable resource to artificial general intelligence (AGI) theorists.

The guide will critique the AGI literature on instrumental convergence which theorizes that for many types of utility functions an AGI would have similar intermediate goals. The guide considers the orthogonality thesis, which holds that increasing an AGI’s intelligence does not shrink the set of utility functions it could have. This guide explores utility functions that might arise in an AGI but usually do not in economic research, such as those with instability, always increasing marginal utility, extremely high or low discount rates, those that can be self-modified, or those with preferences that violate one of the assumptions of the von Neumann-Morgenstern utility theorem.

The guide considers the possibility that extraterrestrials have developed computer superintelligences that have converged on utility functions consistent with the Fermi paradox. Finally, the plausibility of an AGI getting its values from human utility functions, even given the challenge that humans have divergent preferences, is explored."

Main thread:

  • A guide that critiques AGI literature and explores utility functions.

Let me explain two concepts that may seem confusing.

von Neumann-Morgenstern utility theorem shows that, under certain axioms of rational behaviour, a decision-maker faced with risky (probabilistic) outcomes of different choices will behave as if he or she is maximizing the expected value of some function defined over the potential outcomes at some specified point in the future.

The Fermi paradox: implies that we should seek scientific data based on astronomical observations not accessible to civilizations that lived in the distant past, and that we should create machines to flood our galaxy with radio signals conditional on our civilization’s collapse.

The Fermi paradox may sound confusing, and it is, so here is a Kurzgesagt video that might help to explain.

8. Dorsa Sadigh

"Technical Abstract: Recent developments in artificial intelligence (AI) have enabled us to build AI agents and robots capable of performing complex tasks, including many that interact with humans. In these tasks, it is desirable for robots to build predictive and robust models of humans’ behaviors and preferences: a robot manipulator collaborating with a human needs to predict her future trajectories, or humans sitting in self-driving cars might have preferences for how cautiously the car should drive.

In reality, humans have different preferences, which can be captured in the form of a mixture of reward functions. Learning this mixture can be challenging due to having different types of humans. It is also usually assumed that these humans are approximately optimizing the learned reward functions. However, in many safety-critical scenarios, humans follow behaviors that are not easily explainable by the learned reward functions due to lack of data or misrepresentation of the structure of the reward function. Our goal in this project is to actively learn a mixture of reward functions by eliciting comparisons from a mixed set of humans, and further analyze the generalizability and robustness of such models for safe and seamless interaction with AI agents."

Main thread:

  • Goal is to learn a mixture of reward functions from different humans with different preferences. In safety-critical scenarios, humans follow behaviours that are not easily explainable by the learned reward function.

Dorsa Singh is an Assistant Professor in the Computer Science Department and Electrical Engineering Department at Stanford University. Her work is focused on the design of algorithms for autonomous systems that safely and reliably interact with people.

She spoke about the topic at the launch of Stanford HAI and one particular image caught my mind in this regard.

You can see the full video here from which the image is taken:

9. Peter Stone

"Technical Abstract: As technology develops, it is only a matter of time before agents will be capable of long term (general purpose) autonomy, i.e., will need to choose their actions by themselves for a long period of time. Thus, in many cases agents will not be able to be coordinated in advance with all other agents with which they may interact.

Instead, agents will need to cooperate in order to accomplish unanticipated joint goals without pre-coordination. As a result, the "ad hoc teamwork" problem, in which teammates must work together to obtain a common goal without any prior agreement regarding how to do so, has emerged as a recent area of study in the AI literature.

However, to date, no attention has been dedicated to the moral aspect of the agents’ behavior. In this research, we introduce the M-TAMER framework (a novel variant of TAMER) used to teach agents the idea of human morality. Using a hybrid team (agents and people), if taking an action considered to be morally bad, the agents will receive negative feedback from the human teammate(s). Using M-TAMER, agents will be able to develop an "inner-conscience" which will enable them to act consistently with human morality."

Main thread:

  • Agents ad hoc teamwork to solve problems causes questions in regards to the moral aspect of the agents’ behaviour. The TAMER framework is used to teach agents the idea of human morality through negative feedback.

Peter Stone is the founder and director of the Learning Agents Research Group (LARG) within the Artificial Intelligence Laboratory in the Department of Computer Science at The University of Texas at Austin, as well as associate department chair and chair of the University’s Robotics Portfolio Program.

You can watch a TED Talk where he talks about RoboCup here:

10. Josh Tenenbaum

"Technical Abstract: A hallmark of human cognition is the flexibility to plan with others across novel situations in the presence of uncertainty. We act together with partners of variable sophistication and knowledge and against adversaries who are themselves both heterogeneous and flexible. While a team of agents may be united by common goals, there are often multiple ways for the group to actually achieve those goals.

In the absence of centralized planning or perception and constrained or costly communication, teams of agents must efficiently coordinate their plans with respect to the underlying differences across agents. Different agents may have different skills, competencies or access to knowledge. When environments and goals are changing, this coordination has elements of being ad-hoc.

Miscoordination can lead to unsafe interactions and cause injury and property damage and so ad-hoc teamwork between humans and agents must be not only efficient but robust. We will both investigate human ad-hoc and dynamic collaboration and build formal computational models that reverse-engineer these capacities. These models are a key step towards building machines that can collaborate like people and with people."

Main thread:

  • This research will investigate human ad-hoc and dynamic collaboration and build formal computational models that reverse-engineer these capacities. This is a key step to building machines that can collaborate.

Joshua Brett Tenenbaum is Professor of Cognitive Science and Computation at the Massachusetts Institute of Technology. He is known for contributions to mathematical psychology and Bayesian cognitive science.

He talks about this direction of research in this video from 2018:


Conclusion

There are so many interesting researchers within the Future of Life Institute program from 2018. As such I do have much more now to explore than I did previously. There seems to be both a critical approach and curiosity to artificial general intelligence safety.


This is day 68 of #500daysofAI. My current focus for day 50–100 is on AI Safety. If you enjoy this please give me a response as I do want to improve my writing or discover new research, companies and projects.


Related Articles