Chapter 6: Message Manager - Keeping the Conversation Straight
In the previous chapter, we learned how the Action Controller
and Registry
act as the Agent’s “hands” and “toolbox”, executing the specific actions decided by the LLM planner. But how does the LLM get all the information it needs to make those decisions in the first place? How does the Agent keep track of the ongoing conversation, including what it “saw” on the page and what happened after each action?
Imagine you’re having a long, multi-step discussion with an assistant about a complex task. If the assistant has a poor memory, they might forget earlier instructions, the current status, or previous results, making it impossible to proceed correctly. LLMs face a similar challenge: they need the conversation history for context, but they have a limited memory (called the “context window”).
This is the problem the Message Manager solves.
What Problem Does the Message Manager Solve?
The Agent
needs to have a conversation with the LLM. This conversation isn’t just chat; it includes:
- Initial Instructions: The core rules from the System Prompt.
- The Task: The overall goal the Agent needs to achieve.
- Observations: What the Agent currently “sees” in the browser (BrowserContext state, including the DOM Representation).
- Action Results: What happened after the last action was performed (Action Controller & Registry).
- LLM’s Plan: The sequence of actions the LLM decided on.
The Message Manager solves several key problems:
- Organizes History: It structures the conversation chronologically, keeping track of who said what (System, User/Agent State, AI/LLM Plan).
- Formats Messages: It ensures the browser state, action results, and even images are formatted correctly so the LLM can understand them.
- Tracks Size: It keeps count of the “tokens” (roughly, words or parts of words) used in the conversation history.
- Manages Limits: It helps prevent the conversation history from exceeding the LLM’s context window limit, potentially by removing older parts of the conversation if it gets too long.
Think of the MessageManager
as a meticulous secretary for the Agent-LLM conversation. It takes clear, concise notes, presents the current situation accurately, and ensures the conversation doesn’t ramble on for too long, keeping everything within the LLM’s “attention span”.
Meet the Message Manager: The Conversation Secretary
The MessageManager
(found in agent/message_manager/service.py
) is responsible for managing the list of messages that are sent to the LLM in each step.
Here are its main jobs:
- Initialization: When the
Agent
starts, theMessageManager
is created. It immediately adds the foundational messages:- The
SystemMessage
containing the rules from the System Prompt. - A
HumanMessage
stating the overalltask
. - Other initial setup messages (like examples or sensitive data placeholders).
- The
- Adding Browser State: Before asking the LLM what to do next, the
Agent
gets the currentBrowserState
. It then tells theMessageManager
to add this information as aHumanMessage
. This message includes the simplified DOM map, the current URL, and potentially a screenshot (ifuse_vision
is enabled). It also includes the results (ActionResult
) from the previous step, so the LLM knows what happened last. - Adding LLM Output: After the LLM responds with its plan (
AgentOutput
), theAgent
tells theMessageManager
to add this plan as anAIMessage
. This typically includes the LLM’s reasoning and the list of actions to perform. - Adding Action Results (Indirectly): The results from the
Controller.act
call (ActionResult
) aren’t added as separate messages after the action. Instead, they are included in the nextHumanMessage
that contains the browser state (see step 2). This keeps the context tight: “Here’s the current page, and here’s what happened right before we got here.” - Providing Messages to LLM: When the
Agent
is ready to call the LLM, it asks theMessageManager
for the current conversation history (get_messages()
). - Token Management: Every time a message is added, the
MessageManager
calculates how many tokens it adds (_count_tokens
) and updates the total. If the total exceeds the limit (max_input_tokens
), it might trigger a truncation strategy (cut_messages
) to shorten the history, usually by removing parts of the oldest user state message or removing the image first.
How the Agent Uses the Message Manager
Let’s revisit the simplified Agent.step
method from Chapter 1 and highlight the MessageManager
interactions (using self._message_manager
):
# --- File: agent/service.py (Simplified step method - Highlighting MessageManager) ---
class Agent:
# ... (init, run) ...
async def step(self, step_info: Optional[AgentStepInfo] = None) -> None:
logger.info(f"📍 Step {self.state.n_steps}")
state = None
model_output = None
result: list[ActionResult] = []
try:
# 1. Get current state from the browser
state = await self.browser_context.get_state() # Uses BrowserContext
# 2. Add state + PREVIOUS result to message history via MessageManager
# 'self.state.last_result' holds the outcome of the *previous* step's action
self._message_manager.add_state_message(
state,
self.state.last_result, # Result from previous action
step_info,
self.settings.use_vision # Tell it whether to include image
)
# 3. Get the complete, formatted message history for the LLM
input_messages = self._message_manager.get_messages()
# 4. Get LLM's decision on the next action(s)
model_output = await self.get_next_action(input_messages) # Calls the LLM
# --- Agent increments step counter ---
self.state.n_steps += 1
# 5. Remove the potentially large state message before adding the compact AI response
# (This is an optimization mentioned in the provided code)
self._message_manager._remove_last_state_message()
# 6. Add the LLM's response (the plan) to the history
self._message_manager.add_model_output(model_output)
# 7. Execute the action(s) using the Controller
result = await self.multi_act(model_output.action) # Uses Controller
# 8. Store the result of THIS action. It will be used in the *next* step's
# call to self._message_manager.add_state_message()
self.state.last_result = result
# ... (Record step details, handle success/failure) ...
except Exception as e:
# Handle errors...
result = await self._handle_step_error(e)
self.state.last_result = result
# ... (finally block) ...
This flow shows the cycle: add state/previous result -> get messages -> call LLM -> add LLM response -> execute action -> store result for next state message.
How it Works Under the Hood: Managing the Flow
Let’s visualize the key interactions during one step of the Agent loop involving the MessageManager
:
sequenceDiagram
participant Agent
participant BC as BrowserContext
participant MM as MessageManager
participant LLM
participant Controller
Note over Agent: Start of step
Agent->>BC: get_state()
BC-->>Agent: Current BrowserState (DOM map, URL, screenshot?)
Note over Agent: Have BrowserState and `last_result` from previous step
Agent->>MM: add_state_message(BrowserState, last_result)
MM->>MM: Format state/result into HumanMessage (with text/image)
MM->>MM: Calculate tokens for new message
MM->>MM: Add HumanMessage to internal history list
MM->>MM: Update total token count
MM->>MM: Check token limit, potentially call cut_messages()
Note over Agent: Ready to ask LLM
Agent->>MM: get_messages()
MM-->>Agent: Return List[BaseMessage] (System, Task, State1, Plan1, State2...)
Agent->>LLM: Invoke LLM with message list
LLM-->>Agent: LLM Response (AgentOutput containing plan)
Note over Agent: Got LLM's plan
Agent->>MM: _remove_last_state_message() # Optimization
MM->>MM: Remove last (large) HumanMessage from list
Agent->>MM: add_model_output(AgentOutput)
MM->>MM: Format plan into AIMessage (with tool calls)
MM->>MM: Calculate tokens for AIMessage
MM->>MM: Add AIMessage to internal history list
MM->>MM: Update total token count
Note over Agent: Ready to execute plan
Agent->>Controller: multi_act(AgentOutput.action)
Controller-->>Agent: List[ActionResult] (Result of this step's actions)
Agent->>Agent: Store ActionResult in `self.state.last_result` (for next step)
Note over Agent: End of step
This shows how MessageManager
sits between the Agent, the Browser State, and the LLM, managing the history list and token counts.
Diving Deeper into the Code (agent/message_manager/service.py
)
Let’s look at simplified versions of key methods in MessageManager
.
1. Initialization (__init__
and _init_messages
)
When the Agent
creates the MessageManager
, it passes the task and the already-formatted SystemMessage
.
# --- File: agent/message_manager/service.py (Simplified __init__) ---
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage, ToolMessage
# ... other imports ...
from browser_use.agent.views import MessageManagerState # Internal state storage
from browser_use.agent.message_manager.views import MessageMetadata, ManagedMessage # Message wrapper
class MessageManager:
def __init__(
self,
task: str,
system_message: SystemMessage, # Received from Agent
settings: MessageManagerSettings = MessageManagerSettings(),
state: MessageManagerState = MessageManagerState(), # Stores history
):
self.task = task
self.settings = settings # Max tokens, image settings, etc.
self.state = state # Holds the 'history' object
self.system_prompt = system_message
# Only initialize if history is empty (e.g., not resuming from saved state)
if len(self.state.history.messages) == 0:
self._init_messages()
def _init_messages(self) -> None:
"""Add the initial fixed messages to the history."""
# Add the main system prompt (rules)
self._add_message_with_tokens(self.system_prompt)
# Add the user's task
task_message = HumanMessage(
content=f'Your ultimate task is: """{self.task}"""...'
)
self._add_message_with_tokens(task_message)
# Add other setup messages (context, sensitive data info, examples)
# ... (simplified - see full code for details) ...
# Example: Add a placeholder for where the main history begins
placeholder_message = HumanMessage(content='[Your task history memory starts here]')
self._add_message_with_tokens(placeholder_message)
This sets up the foundational context for the LLM.
2. Adding Browser State (add_state_message
)
This method takes the current BrowserState
and the previous ActionResult
, formats them into a HumanMessage
(potentially multi-modal with image and text parts), and adds it to the history.
# --- File: agent/message_manager/service.py (Simplified add_state_message) ---
# ... imports ...
from browser_use.browser.views import BrowserState
from browser_use.agent.views import ActionResult, AgentStepInfo
from browser_use.agent.prompts import AgentMessagePrompt # Helper to format state
class MessageManager:
# ... (init) ...
def add_state_message(
self,
state: BrowserState, # The current view of the browser
result: Optional[List[ActionResult]] = None, # Result from *previous* action
step_info: Optional[AgentStepInfo] = None,
use_vision=True, # Flag to include screenshot
) -> None:
"""Add browser state and previous result as a human message."""
# Add any 'memory' messages from the previous result first (if any)
if result:
for r in result:
if r.include_in_memory and (r.extracted_content or r.error):
content = f"Action result: {r.extracted_content}" if r.extracted_content else f"Action error: {r.error}"
msg = HumanMessage(content=content)
self._add_message_with_tokens(msg)
result = None # Don't include again in the main state message
# Use a helper class to format the BrowserState (+ optional remaining result)
# into the correct message structure (text + optional image)
state_prompt = AgentMessagePrompt(
state,
result, # Pass any remaining result info
include_attributes=self.settings.include_attributes,
step_info=step_info,
)
# Get the formatted message (could be complex list for vision)
state_message = state_prompt.get_user_message(use_vision)
# Add the formatted message (with token calculation) to history
self._add_message_with_tokens(state_message)
3. Adding Model Output (add_model_output
)
This takes the LLM’s plan (AgentOutput
) and formats it as an AIMessage
with specific “tool calls” structure that many models expect.
# --- File: agent/message_manager/service.py (Simplified add_model_output) ---
# ... imports ...
from browser_use.agent.views import AgentOutput
class MessageManager:
# ... (init, add_state_message) ...
def add_model_output(self, model_output: AgentOutput) -> None:
"""Add model output (the plan) as an AI message with tool calls."""
# Format the output according to OpenAI's tool calling standard
tool_calls = [
{
'name': 'AgentOutput', # The 'tool' name
'args': model_output.model_dump(mode='json', exclude_unset=True), # The LLM's JSON output
'id': str(self.state.tool_id), # Unique ID for the call
'type': 'tool_call',
}
]
# Create the AIMessage containing the tool calls
msg = AIMessage(
content='', # Content is often empty when using tool calls
tool_calls=tool_calls,
)
# Add it to history
self._add_message_with_tokens(msg)
# Add a corresponding empty ToolMessage (required by some models)
self.add_tool_message(content='') # Content depends on tool execution result
def add_tool_message(self, content: str) -> None:
"""Add tool message to history (often confirms tool call receipt/result)"""
# ToolMessage links back to the AIMessage's tool_call_id
msg = ToolMessage(content=content, tool_call_id=str(self.state.tool_id))
self.state.tool_id += 1 # Increment for next potential tool call
self._add_message_with_tokens(msg)
4. Adding Messages and Counting Tokens (_add_message_with_tokens
, _count_tokens
)
This is the core function called by others to add any message to the history, ensuring token counts are tracked.
# --- File: agent/message_manager/service.py (Simplified _add_message_with_tokens) ---
# ... imports ...
from langchain_core.messages import BaseMessage
from browser_use.agent.message_manager.views import MessageMetadata, ManagedMessage
class MessageManager:
# ... (other methods) ...
def _add_message_with_tokens(self, message: BaseMessage, position: int | None = None) -> None:
"""Internal helper to add any message with its token count metadata."""
# 1. Optionally filter sensitive data (replace actual data with placeholders)
# if self.settings.sensitive_data:
# message = self._filter_sensitive_data(message) # Simplified
# 2. Count the tokens in the message
token_count = self._count_tokens(message)
# 3. Create metadata object
metadata = MessageMetadata(tokens=token_count)
# 4. Add the message and its metadata to the history list
# (self.state.history is a MessageHistory object)
self.state.history.add_message(message, metadata, position)
# Note: self.state.history.add_message also updates the total token count
# 5. Check if history exceeds token limit and truncate if needed
self.cut_messages() # Check and potentially trim history
def _count_tokens(self, message: BaseMessage) -> int:
"""Estimate tokens in a message."""
tokens = 0
if isinstance(message.content, list): # Multi-modal (text + image)
for item in message.content:
if isinstance(item, dict) and 'image_url' in item:
# Add fixed cost for images
tokens += self.settings.image_tokens
elif isinstance(item, dict) and 'text' in item:
# Estimate tokens based on text length
tokens += len(item['text']) // self.settings.estimated_characters_per_token
elif isinstance(message.content, str): # Text message
text = message.content
if hasattr(message, 'tool_calls'): # Add tokens for tool call structure
text += str(getattr(message, 'tool_calls', ''))
tokens += len(text) // self.settings.estimated_characters_per_token
return tokens
def cut_messages(self):
"""Trim messages if total tokens exceed the limit."""
# Calculate how many tokens we are over the limit
diff = self.state.history.current_tokens - self.settings.max_input_tokens
if diff <= 0:
return # We are within limits
logger.debug(f"Token limit exceeded by {diff}. Trimming history.")
# Strategy:
# 1. Try removing the image from the *last* (most recent) state message if present.
# (Code logic finds the last message, checks content list, removes image item, updates counts)
# ... (Simplified - see full code for image removal logic) ...
# 2. If still over limit after image removal (or no image was present),
# trim text content from the *end* of the last state message.
# Calculate proportion to remove, shorten string, create new message.
# ... (Simplified - see full code for text trimming logic) ...
# Ensure we don't get stuck if trimming isn't enough (raise error)
if self.state.history.current_tokens > self.settings.max_input_tokens:
raise ValueError("Max token limit reached even after trimming.")
This shows the basic mechanics of adding messages, calculating their approximate size, and applying strategies to keep the history within the LLM’s context window limit.
Conclusion
The MessageManager
is the Agent’s conversation secretary. It meticulously records the dialogue between the Agent (reporting browser state and action results) and the LLM (providing analysis and action plans), starting from the initial System Prompt
and task definition.
Crucially, it formats these messages correctly, tracks the conversation’s size using token counts, and implements strategies to keep the history concise enough for the LLM’s limited context window. Without the MessageManager
, the Agent would quickly lose track of the conversation, and the LLM wouldn’t have the necessary context to guide the browser effectively.
Many of the objects managed and passed around by the MessageManager
, like BrowserState
, ActionResult
, and AgentOutput
, are defined as specific data structures. In the next chapter, we’ll take a closer look at these important Data Structures (Views).
Next Chapter: Data Structures (Views)
Generated by AI Codebase Knowledge Builder