Chapter 1: The LLM - Your Agent’s Brainpower
Welcome to the OpenManus tutorial! We’re thrilled to have you on board. Let’s start with the absolute core of any intelligent agent: the “brain” that does the thinking and understanding. In OpenManus, this brainpower comes from something called a Large Language Model (LLM), and we interact with it using our LLM
class.
What’s the Big Deal with LLMs?
Imagine you have access to an incredibly smart expert who understands language, can reason, write, summarize, and even generate creative ideas. That’s kind of what an LLM (like GPT-4, Claude, or Llama) is! These are massive AI models trained on vast amounts of text and data, making them capable of understanding and generating human-like text.
They are the engine that drives the “intelligence” in AI applications like chatbots, writing assistants, and, of course, the agents you’ll build with OpenManus.
Why Do We Need an LLM
Class?
Okay, so LLMs are powerful. Can’t our agent just talk directly to them?
Well, it’s a bit more complicated than a casual chat. Talking to these big AI models usually involves:
- Complex APIs: Each LLM provider (like OpenAI, Anthropic, Google, AWS) has its own specific way (an API or Application Programming Interface) to send requests and get responses. It’s like needing different phone numbers and dialing procedures for different experts.
- API Keys: You need secret keys to prove you’re allowed to use the service (and get billed for it!). Managing these securely is important.
- Formatting: You need to structure your questions (prompts) and conversation history in a very specific format the LLM understands.
- Errors & Retries: Sometimes network connections hiccup, or the LLM service is busy. You need a way to handle these errors gracefully, maybe by trying again.
- Tracking Usage (Tokens): Using these powerful models costs money, often based on how much text you send and receive (measured in “tokens”). You need to keep track of this.
Doing all this every time an agent needs to think would be repetitive and messy!
This is where the LLM
class comes in. Think of it as a super-helpful translator and network manager rolled into one.
- It knows how to talk to different LLM APIs.
- It securely handles your API keys (using settings from the Configuration).
- It formats your messages correctly.
- It automatically retries if there’s a temporary glitch.
- It helps count the “tokens” used.
It hides all that complexity, giving your agent a simple way to “ask” the LLM something.
Use Case: Let’s say we want our agent to simply answer the question: “What is the capital of France?” The LLM
class will handle all the background work to get that answer from the actual AI model.
How Do Agents Use the LLM
Class?
In OpenManus, agents (which we’ll learn more about in Chapter 3: BaseAgent) have an llm
component built-in. Usually, you don’t even need to create it manually; the agent does it for you when it starts up, using settings from your configuration file (config/config.toml
).
The primary way an agent uses the LLM
class is through its ask
method.
Let’s look at a simplified example of how you might use the LLM
class directly (though usually, your agent handles this):
# Import necessary classes
from app.llm import LLM
from app.schema import Message
import asyncio # Needed to run asynchronous code
# Assume configuration is already loaded (API keys, model name, etc.)
# Create an instance of the LLM class (using default settings)
llm_interface = LLM()
# Prepare the question as a list of messages
# (We'll learn more about Messages in Chapter 2)
conversation = [
Message.user_message("What is the capital of France?")
]
# Define an async function to ask the question
async def ask_question():
print("Asking the LLM...")
# Use the 'ask' method to send the conversation
response = await llm_interface.ask(messages=conversation)
print(f"LLM Response: {response}")
# Run the async function
asyncio.run(ask_question())
Explanation:
- We import the
LLM
class and theMessage
class (more onMessage
in the next chapter). - We create
llm_interface = LLM()
. This sets up our connection to the LLM using settings found in the configuration. - We create a
conversation
list containing our question, formatted as aMessage
object. TheLLM
class needs the input in this list-of-messages format. - We call
await llm_interface.ask(messages=conversation)
. This is the core action! We send our message list to the LLM via our interface. Theawait
keyword is used because communicating over the network takes time, so we wait for the response asynchronously. - The
ask
method returns the LLM’s text response as a string.
Example Output (might vary slightly):
Asking the LLM...
LLM Response: The capital of France is Paris.
See? We just asked a question and got an answer, without worrying about API keys, JSON formatting, or network errors! The LLM
class handled it all.
There’s also a more advanced method called ask_tool
, which allows the LLM to use specific Tools, but we’ll cover that later. For now, ask
is the main way to get text responses.
Under the Hood: What Happens When You ask
?
Let’s peek behind the curtain. When your agent calls llm.ask(...)
, several things happen in sequence:
- Format Messages: The
LLM
class takes your list ofMessage
objects and converts them into the exact dictionary format the specific LLM API (like OpenAI’s or AWS Bedrock’s) expects. This might involve adding special tags or structuring image data if needed (llm.py: format_messages
). - Count Tokens: It calculates roughly how many “tokens” your input messages will use (
llm.py: count_message_tokens
). - Check Limits: It checks if sending this request would exceed any configured token limits (
llm.py: check_token_limit
). If it does, it raises a specificTokenLimitExceeded
error before making the expensive API call. - Send Request: It sends the formatted messages and other parameters (like the desired model,
max_tokens
) to the LLM’s API endpoint over the internet (llm.py: client.chat.completions.create
or similar for AWS Bedrock inbedrock.py
). - Handle Glitches (Retry): If the API call fails due to a temporary issue (like a network timeout or the service being momentarily busy), the
LLM
class automatically waits a bit and tries again, up to a few times (thanks to the@retry
decorator inllm.py
). - Receive Response: Once successful, it receives the response from the LLM API.
- Extract Answer: It pulls out the actual text content from the API response.
- Update Counts: It records the number of input tokens used and the number of tokens in the received response (
llm.py: update_token_count
). - Return Result: Finally, it returns the LLM’s text answer back to your agent.
Here’s a simplified diagram showing the flow:
sequenceDiagram
participant Agent
participant LLMClass as LLM Class (app/llm.py)
participant TokenCounter as Token Counter (app/llm.py)
participant OpenAIClient as OpenAI/Bedrock Client (app/llm.py, app/bedrock.py)
participant LLM_API as Actual LLM API (e.g., OpenAI, AWS Bedrock)
Agent->>+LLMClass: ask(messages)
LLMClass->>LLMClass: format_messages(messages)
LLMClass->>+TokenCounter: count_message_tokens(formatted_messages)
TokenCounter-->>-LLMClass: input_token_count
LLMClass->>LLMClass: check_token_limit(input_token_count)
Note over LLMClass: If limit exceeded, raise Error.
LLMClass->>+OpenAIClient: create_completion(formatted_messages, model, ...)
Note right of OpenAIClient: Handles retries on network errors etc.
OpenAIClient->>+LLM_API: Send HTTP Request
LLM_API-->>-OpenAIClient: Receive HTTP Response
OpenAIClient-->>-LLMClass: completion_response
LLMClass->>LLMClass: extract_content(completion_response)
LLMClass->>+TokenCounter: update_token_count(input_tokens, completion_tokens)
TokenCounter-->>-LLMClass:
LLMClass-->>-Agent: llm_answer (string)
Let’s look at a tiny piece of the ask
method in app/llm.py
to see the retry mechanism:
# Simplified snippet from app/llm.py
from tenacity import retry, wait_random_exponential, stop_after_attempt, retry_if_exception_type
from openai import OpenAIError
# ... other imports ...
class LLM:
# ... other methods like __init__, format_messages ...
@retry( # This decorator handles retries!
wait=wait_random_exponential(min=1, max=60), # Wait 1-60s between tries
stop=stop_after_attempt(6), # Give up after 6 tries
retry=retry_if_exception_type((OpenAIError, Exception)) # Retry on these errors
)
async def ask(
self,
messages: List[Union[dict, Message]],
# ... other parameters ...
) -> str:
try:
# 1. Format messages (simplified)
formatted_msgs = self.format_messages(messages)
# 2. Count tokens & Check limits (simplified)
input_tokens = self.count_message_tokens(formatted_msgs)
if not self.check_token_limit(input_tokens):
raise TokenLimitExceeded(...) # Special error, not retried
# 3. Prepare API call parameters (simplified)
params = {"model": self.model, "messages": formatted_msgs, ...}
# 4. Make the actual API call (simplified)
response = await self.client.chat.completions.create(**params)
# 5. Process response & update tokens (simplified)
answer = response.choices[0].message.content
self.update_token_count(response.usage.prompt_tokens, ...)
return answer
except TokenLimitExceeded:
raise # Don't retry token limits
except Exception as e:
logger.error(f"LLM ask failed: {e}")
raise # Let the @retry decorator handle retrying other errors
Explanation:
- The
@retry(...)
part above theasync def ask(...)
line is key. It tells Python: “If the code inside thisask
function fails with certain errors (likeOpenAIError
), wait a bit and try running it again, up to 6 times.” - Inside the
try...except
block, the code performs the steps we discussed: format, count, check, call the API (self.client.chat.completions.create
), and process the result. - Crucially, it catches the
TokenLimitExceeded
error separately andraise
s it again immediately – we don’t want to retry if we know we’ve run out of tokens! - Other errors will be caught by the final
except Exception
, logged, and re-raised, allowing the@retry
mechanism to decide whether to try again.
This shows how the LLM
class uses libraries like tenacity
to add resilience without cluttering the main logic of your agent.
Wrapping Up Chapter 1
You’ve learned about the core “brain” – the Large Language Model (LLM) – and why we need the LLM
class in OpenManus to interact with it smoothly. This class acts as a vital interface, handling API complexities, errors, and token counting, providing your agents with simple ask
(and ask_tool
) methods.
Now that we understand how to communicate with the LLM, we need a way to structure the conversation – keeping track of who said what. That’s where Messages and Memory come in.
Let’s move on to Chapter 2: Message / Memory to explore how we represent and store conversations for our agents.
Generated by AI Codebase Knowledge Builder