4 Autonomous AI Agents you need to know

“Westworld” simulation, Camel, BabyAGI, AutoGPT ⭐ with the power of LangChain ⭐

Sophia Yang, Ph.D.
Towards Data Science

--

Autonomous AI agents have been the hottest topic. It’s truly impressive how rapidly things have progressed and unfolded in this area. Are autonomous AI agents the future, particularly in the area of prompt engineering? AI experts including Andrej Karpathy referred to AutoGPTs as the Next frontier of prompt engineering. I think so as well. What do you think?

In the simplest form, Autonomous AI agents run on a loop to generate self-directed instructions and actions at each iteration. As a result, they do not rely on humans to guide their conversations, and they are highly scalable. There are at least 4 notable Autonomous AI agents projects that came out in the last two weeks, and in this article, we are going to dive into each of them:

  • “Westworld” simulation — released on Apr. 7
  • Camel — released on Mar. 21
  • BabyAGI — released on Apr. 3
  • AutoGPT — released on Mar. 30

Project 1: “Westworld” simulation

Figure 1. Generative agents create believable simulacra of human behavior. Source: https://arxiv.org/pdf/2304.03442.pdf

Researchers from Stanford and Google created an interactive sandbox environment with 25 generative AI agents that can simulate human behavior. They walk in the park, join for coffee at a cafe, and share news with colleagues. They demonstrated surprisingly good social behaviors:

“For example, starting with only a single user-specified notion that one agent wants to throw a Valentine’s Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time.”

These believable simulations of human behavior are possible because of an agent architecture (see Figure 2) that extends a large language model with three important architecture basics: memory, reflection, and planning.

Figure 2. Generative agent architecture. Source: https://arxiv.org/pdf/2304.03442.pdf

1) Memory and Retrieval

The memory stream contains a list of observations for each agent with timestamps. Observations can be behaviors performed by the agent or behaviors that the agent perceives from others. The memory stream is long. However, not all observations in the memory stream are important.

To retrieve the most important memory to pass on to the language model, there are three factors to consider:

  • Recency: recent memories are more important
  • Importance: memories the agent believes to be important. For example, breaking up with someone is a more important memory than eating breakfast.
  • Relevance: memories that are related to the situation, a query memory. For example, when discussing what to study for a chemistry test, schoolwork memories are more important.
Figure 3. The memory stream comprises a large number of observations. Retrieval identifies a subset of these observations that should be passed to the language model. Source: https://arxiv.org/pdf/2304.03442.pdf

2) Reflection

Reflections are high-level abstract thoughts to help agents generalize and make inferences. Reflections get generated periodically with the following two questions: “what are 3 most salient high-level questions we can answer about the subjects in the statements?”, “What 5 high-level insights can you infer from the above statements?”

Figure 4. A reflection tree. Source: https://arxiv.org/pdf/2304.03442.pdf

3) Planning

Planning is important because the actions should not just be focused on in the moment but also over a longer time horizon so that they can be coherent and believable. A plan is also stored in the memory stream. Agents can create actions based on the plan and they can react and update the plan according to the other observations in the memory stream.

Figure 5. Valentine’s Day party. Source: https://arxiv.org/pdf/2304.03442.pdf

The possibilities for applications of this are immense and maybe even a little scary. Imagine an assistant who observes and watches your every move, makes plans for you, and even perhaps executes plans for you. It’d automatically adjust the lights, brew the coffee, and reserve dinner for you before you even tell it to do anything.

⭐LangChain Implementation⭐

…Coming soon…

I heard LangChain is working on this ;) Will add it once it’s implemented.

Project 2: Camel

CAMEL (Communicative Agents for “Mind” Exploration of Large Scale Language Model Society) proposes a role-playing agent framework where two AI agents communicate with each other:

1) AI user agent: give instructions to the AI assistant with the goal of completing the task.

2) AI assistant agent: follow AI user’s instructions and respond with solutions to the task.

3) task-specifier agent: there is actually another agent called the task-specifier agent to brainstorm a specific task for the AI user and AI assistant to complete. This helps write a concrete task prompt without the user spending time defining it.

In this example (Figure 6), a human has an idea of developing a trading bot. The AI user is a stock trader and The AI assistant is a Python programmer. The task-specific agent first comes up with a specific task with task details (monitor social media sentiment and trade stock based on the sentiment analysis results). Then the AI user agent becomes the task planner, the AI assistant agent becomes the task executor, and they prompt each other in a loop until some termination conditions are met.

Figure 6. Role-playing framework. Source: https://arxiv.org/abs/2303.17760

The essence of Camel lies in its prompt engineering, i.e., inception prompting. The prompts are actually carefully defined to assign roles, prevent flipping roles, prohibit harm and false information, and encourage consistent conversation. See detailed prompts in the Camel paper.

⭐LangChain Implementation⭐

The LangChain implementation used the prompts mentioned in the Camel paper and defined three agents: task_specify_agent, assistant_agent, and user_agent. It then uses a while loop to loop through the conversation between the assistant agent and the user agent:

chat_turn_limit, n = 30, 0
while n < chat_turn_limit:
n += 1
user_ai_msg = user_agent.step(assistant_msg)
user_msg = HumanMessage(content=user_ai_msg.content)
print(f"AI User ({user_role_name}):\n\n{user_msg.content}\n\n")

assistant_ai_msg = assistant_agent.step(user_msg)
assistant_msg = HumanMessage(content=assistant_ai_msg.content)
print(f"AI Assistant ({assistant_role_name}):\n\n{assistant_msg.content}\n\n")
if "<CAMEL_TASK_DONE>" in user_msg.content:
break

The results look quite reasonable!

In Camel, the AI assistant’s executions are simply answers from the language model without actually using any tools to run the Python code. I wonder if LangChain has plans to integrate Camel with all the amazing LangChain tools 🤔

🐋 Real-world use cases 🐋

  • Make a game
  • Infiltrate communication networks

Project 3: BabyAGI

Yohei Nakajima announced the “Task-driven Autonomous Agent” on March 28 and then open-sourced the BabyAGI project on April 3. The key feature of BabyAGI is just three agents: Task Execution Agent, Task Creation Agent, and Task Prioritization Agent.

  • 1) The task execution agent completes the first task from the task list
  • 2) The task creation agent creates new tasks based on the objective and result of the previous task.
  • 3) The task prioritization agent then reorders the tasks.

And then this simple process gets repeated over and over.

In a LangChain webinar, Yohei mentioned that designed BabyAGI in a way to emulate how he works. Specifically, he starts each morning by tackling the first item on his to-do list and then works through his tasks. If a new task arises, he simply adds it to his list. At the end of the day, he reevaluates and reprioritizes his list. This same approach was then mapped onto the agent.

Figure 7. BabyAGI flow chart. Source:https://yoheinakajima.com/task-driven-autonomous-agent-utilizing-gpt-4-pinecone-and-langchain-for-diverse-applications/ (funny thing that GPT-4 wrote this research paper)

⭐BabyAGI + LangChain⭐

BabyAGI is easy to run within the LangChain framework. Check out the code here. It basically creates a BabyAGI controller which composes of three chains TaskCreationChain, TaskPrioritizationChain, and ExecutionChain, and runs them in a (potentially-)infinite loop. With Langchain, you can define the max iterations, so that it doesn’t run forever and spend all the money on OpenAI API.

OBJECTIVE = "Write a weather report for SF today"
llm = OpenAI(temperature=0)
# Logging of LLMChains
verbose=False
# If None, will keep on going forever
max_iterations: Optional[int] = 3
baby_agi = BabyAGI.from_llm(
llm=llm,
vectorstore=vectorstore,
verbose=verbose,
max_iterations=max_iterations
)
baby_agi({"objective": OBJECTIVE})

Here is the result from 2 iteration runs:

⭐BabyAGI + LangChain Tools⭐ = Superpower

As you can see from the example above, BabyAGI only “executes” things with an LLM response. With the power of LangChain tools, the execution step can use various tools for example Google Search to actually search for information online. Here is an example, where the “execution” uses Google Search to search for the current weather conditions in San Francisco.

The potential for applications of BabyAGI is also immense! We can just tell it an objective and it will execute for you. The only thing I think it’s missing is an interface to accept user feedback. For example, before BabyAGI makes an appointment for me, I’d like it to check with me first. I think Yohei is actually working on this to allow for real-time input for the system to dynamically adjust task prioritization.

🐋 Real-world use cases 🐋

Project 4: AutoGPT

AutoGPT is a lot like BabyAGI combined with LangChain tools. It follows similar logic as BabyAGI: it’s an infinite loop of generating thoughts, reasoning, generating plans, criticizing, planning the next action, and executing.

In the executing step, AutoGPT can execute many commands such as Google Search, browse websites, write to files, and execute Python files. And it can even start and delete GPT agents?! That’s pretty cool!

When running AutoGPT, there are two initial inputs that will prompt you to enter: 1) AI’s role and 2) AI’s goal. Here I’m just using the given example — building a business.

It was able to generate thoughts, reasoning, a plan, criticism, plan the next action, and execute (Google search in this case):

One thing I really like about AutoGPT is that it allows human interaction (sort of). When it wants to run Google commands, it asks for authorization, so that you can stop the loop before spending too much money on OpenAI API tokens. It’d be nice though if it also allows conversation with humans for us to give better directions and feedback in real-time.

⭐LangChain Implementation⭐

…Coming soon…

I heard LangChain is working on this ;) Will add it once it’s implemented.

🐋 Real-world use cases 🐋

  • Write and execute Python code:
  • More:

Conclusion

In this article, we explore four prominent autonomous AI agents projects. Despite being in their early stages of development, they have already showcased impressive outcomes and potential applications. However, it is worth noting that all these projects come with significant limitations and risks, such as the possibility of an agent getting stuck in a loop, hallucination and security issues, as well as ethical concerns. Nevertheless, autonomous agents undoubtedly represent a promising field for the future, and I am excited to see further progress and advancements in this area.

References:

“Westworld” simulation

Camel

BabyAGI

AutoGPT

. . .

By Sophia Yang on April 16, 2023

Sophia Yang is a Senior Data Scientist. Connect with me on LinkedIn, Twitter, and YouTube and join the DS/ML Book Club ❤️

--

--