How LLM Agent works?

Apr 17, 2024

Fig. 1. Overview of a LLM-powered autonomous agent system. (Image source: Weng, Lilian. 2023)

LLM Agent like as a smart assistant. It’s not just a chatbot that responds to questions. It’s more like a digital helper that can take actions, make decisions, and use various tools to accomplish tasks. Just like a human might use a calculator for math or search the internet for information, an agent can use digital tools to enhance its capabilities.

The Role of LLMs in Agents

Enhancing Reasoning and Acting

LLMs can be thought of as the “brain” of an agent. They are responsible for understanding the task at hand, thinking through the problem, and deciding what actions to take. This process is often referred to as the ReAct approach, combining reasoning and acting.

Using Tools

Just like you might use a calculator for complex math, agents can use digital tools to perform tasks they aren’t inherently good at, such as precise calculations or looking up specific information online.

LLM Components

Agent core

The agent core is the central coordination module that manages the core logic and behavioral characteristics of an Agent. Think of it as the “key decision making module” of the agent. It is also where we define:

General goals of the agent: Contains overall goals and objectives for the agent.

Tools for execution: Essentially a short list or a “user manual” for all the tools to which the agent has access

Explanation for how to make use of different planning modules: Details about the utility of different planning modules and which to use in what situation.

Relevant Memory: This is a dynamic section which fills the most relevant memory items from past conversations with the user at inference time. The “relevance” is determined using the question user asks.

Persona of the Agent (optional): This persona description is typically used to either bias the model to prefer using certain types of tools or to imbue typical idiosyncrasies in the agent’s final response.

The Agent core has access to Goals, Tools, planning helpers and a general format for the answer.

Memory module

Memory modules play a critical role in AI agents. A memory module can essentially be thought of as a store of the agent’s internal logs as well as interactions with a user.

There are two types of memory modules:

Short-term memory: A ledger of actions and thoughts that an agent goes through to attempt to answer a single question from a user: the agent’s “train of thought.”

Long-term memory: A ledger of actions and thoughts about events that happen between the user and agent. It is a log book that contains a conversation history stretching across weeks or months.

Memory requires more than semantic similarity-based retrieval. Typically, a composite score is made up of semantic similarity, importance, recency, and other application-specific metrics. It is used for retrieving specific information.

Tools

Tools are well-defined executable workflows that agents can use to execute tasks. Oftentimes, they can be thought of as specialized third-party APIs.

For instance, agents can use a RAG pipeline to generate context aware answers, a code interpreter to solve complex programmatically tasks, an API to search information over the internet, or even any simple API service like a weather API or an API for an Instant messaging application.

Planning module

Complex problems, such as analyzing a set of financial reports to answer a layered business question, often require nuanced approaches. With an LLM–powered agent, this complexity can be dealt with by using a combination of two techniques:

Steps involved to develop own LLM agent

1. A base LLM

2. A tool that you will be interacting with (e.g., Google Custom Search Engine, Calculator)

3. An agent to control the interaction

Llama Index is a powerful framework for building LLM agents that connect custom data sources to large language models.

Evaluating LLM agents based on context relevance, groundedness, and answer relevance helps maintain the quality of responses.

True Lens provides an open-source library for tracking and evaluating LLM experiments, offering valuable insights for improving agent performance.

Benefits of an Agent-Based Approach

Freedom and Efficiency – Agents operate autonomously, reducing the need for constant human intervention and thereby freeing humans for other activities.

Flexibility – Agents can be adapted to various needs based on prompts.

Specialization – Prompting and training allows for deep expertise in domains. This is especially advantageous in disciplines that require a thorough understanding, like healthcare.

Solving Complex Problems – Agents can succeed where it’s harder for humans. They can efficiently and quickly solve complex problems that require intense calculations, multiple disciplines or thorough research.

Innovation and Progress – LLM Agents provide ideas and information that can drive technology and development forward.

Multi-Agent LLMs

Multi-agent LLM systems are frameworks where multiple LLM agents interact with each other or work in collaboration to achieve complex tasks or goals. This extends the capabilities of individual LLM Agents by leveraging their collective strengths and specialized expertise of multiple models. By communicating, collaborating, sharing information and insights and allocating tasks, multi-agent LLM systems can solve problems more effectively than a single agent can, flexibly and at scale.

LLM Agents Use Cases

Customer Service and Support – Providing customer support, handling inquiries, resolving issues, and offering information 24/7.

Content Creation and Copywriting – Generating creative content, such as articles, blogs, scripts, and advertising copy.

Semantic search agent -Semantic search is based on meaning, distinguished from lexical search which focuses on the exact words used.

Language Translation and Localization – Translation services for various content types, aiding in bridging language barriers and localizing content for different regions.

Education and Tutoring – Functioning as personalized tutors, providing explanations, answering questions, and assisting with learning materials in a wide range of subjects.

Programming and Code Generation – Writing, reviewing, and debugging code, thereby speeding up the development process and helping in learning programming languages.

Research and Data Analysis – Sifting through large volumes of text, summarizing information, and extracting relevant data, which is invaluable for research and analysis.

Healthcare Assistance – Offering support in areas like patient interaction, medical documentation, and even as assistive tools for diagnosis and treatment planning, though they don’t replace professional medical advice.

Personal Assistants – Managing schedules, setting reminders, answering questions, and even helping with email management and other administrative tasks.

Legal and Compliance Assistance – Assisting in legal research, document review, and drafting legal documents (without replacing professional legal advice).

Accessibility Tools – Enhancing accessibility through tools like voice-to-text conversion, reading assistance, and simplifying complex text.

Interactive Entertainment – In gaming and interactive storytelling, creating dynamic narratives, character dialogue, and responsive storytelling elements.

Marketing and Customer Insights – Analyzing customer feedback, conducting sentiment analysis, and generating marketing content, providing valuable insights into consumer behavior.

Social Media Management – Managing social media content, from generating posts to analyzing trends and engaging with audiences.

Human Resources Management – Aiding in resume screening, answering employee queries, and even in training and development activities.

Challenges

Limited context capacity: LLMs are constrained by finite context lengths, limiting their ability to incorporate historical data and complex instructions. This constraint hampers performance, particularly in tasks requiring a deep contextual understanding. To address this challenge, efforts should focus on expanding the context window, allowing LLMs to grasp more historical information and detailed instructions.

Long-term planning: Unlike humans who can adjust plans in response to unexpected circumstances, LLMs find it difficult to deviate from predefined paths. To mitigate this challenge, mechanisms must be developed that enable agents to adapt their plans when confronted with unexpected errors. Agents should possess the capacity to handle deviations and adjust their strategies to attain desired outcomes.

Natural language Interface: LLMs heavily rely on natural language interfaces for communication with external components, like memory and tools. However, the reliability of model outputs can be uncertain. LLMs may produce unreliable results, encounter formatting issues, or even display rebellious behavior by refusing instructions. Ensuring the reliability and accuracy of these interfaces is crucial for optimal agent performance. This requires refining the natural language generation capabilities of LLMs to produce more accurate and contextually relevant responses.

Thiyagarajan’s Substack

Discussion about this post