Welcome back! In Chapter 3: Tool, we equipped our agent with “superpowers” – tools like web search or calculators that let it interact with the world and perform actions. We saw how the agent’s “brain” (the LLM) decides which tool to use, and the agent executes it.
But wait… how does the agent remember what it has already done? If it searches for the capital of France in Step 1, how does it remember “Paris” when deciding what to do in Step 2 (like finding the weather in Paris)?
This is where AgentMemory comes in. Think of it as the agent’s dedicated notepad or, even better, a ship’s logbook.
The Problem: An Agent with Amnesia
Imagine a captain sailing a ship on a long voyage. After each hour, they completely forget everything that happened before – the course they set, the islands they passed, the storms they weathered. How could they possibly reach their destination? They’d be lost!
❓ “Where am I? What was I doing?”
An agent without memory is like that forgetful captain. It might perform a single action correctly, but it wouldn’t understand the context. It wouldn’t know:
What the original goal (task) was.
What steps it has already taken.
What results (observations) it got from those steps.
What errors it might have encountered.
Without this history, the agent can’t make informed decisions about what to do next. It can’t build upon previous results or learn from mistakes within the same task.
The Solution: The Ship’s Logbook (AgentMemory)
The AgentMemory is the component that solves this problem. It automatically records every significant event during the agent’s “voyage” (its execution run).
📜 “Log Entry: Searched ‘Capital of France’. Result: ‘Paris’.”
Just like a ship’s logbook helps the captain navigate, the AgentMemory helps the agent maintain context and proceed effectively.
What Does the AgentMemory Store?
The AgentMemory keeps a chronological record of the agent’s journey. For each run, it typically stores:
System Prompt: The initial instructions given to the agent’s LLM brain (we’ll see more in Chapter 5: PromptTemplates).
Initial Task: The main goal the user gave the agent (e.g., “What is the capital of France, and what is its current weather?”).
Steps: A list detailing each cycle of the agent’s operation:
Planning (Optional): If the agent makes plans, the plan itself is recorded.
Thinking: The LLM’s reasoning process and the action it decided to take (e.g., “Thought: I need the capital. Action: Use search tool”).
Action: The specific Tool called and the arguments used (e.g., search("Capital of France")). This could also be code execution for code-based agents.
Observation: The result received after performing the action (e.g., “Paris”).
Errors: If something went wrong during the step (e.g., a tool failed), the error is noted.
This detailed history allows the agent (specifically, the LLM guiding it) to look back at any point and understand the full context before deciding the next move.
How is AgentMemory Used? (Mostly Automatic!)
The good news is that you, as the user, usually don’t need to interact directly with AgentMemory. The MultiStepAgent manages it automatically behind the scenes!
Here’s the key interaction:
Before “Thinking”: When the agent needs to decide the next step (the “Think” phase), the MultiStepAgent asks the AgentMemory to format the recorded history (task, previous actions, observations, errors) into a sequence of messages. This happens via a method often called write_memory_to_messages.
Consulting the Brain: This formatted history is sent to the LLM via the Model Interface. This gives the LLM the full context it needs to provide a sensible next step. (“Okay, based on the task ‘Capital and Weather’, and the fact we just found ‘Paris’, what should we do now?”).
After “Acting” and “Observing”: Once the agent performs an action and gets an observation (or an error), the MultiStepAgent records this new information as a new step in the AgentMemory.
So, the memory is constantly being read from (to inform the LLM) and written to (to record new events).
Example Revisited: Capital and Weather Logbook
Let’s trace our “Capital of France and Weather” example from Chapter 1: MultiStepAgent and see what the AgentMemory logbook might look like (simplified):
(Start of Run)
System Prompt: Recorded (e.g., “You are a helpful assistant…”)
Task: Recorded (task: "What is the capital of France, and what is its current weather?")
(Step 1)
Think/Action: Recorded (thought: "Need capital.", action: search("Capital of France"))
Observation: Recorded (observation: "Paris")
(Step 2)
Think/Action: Recorded (thought: "Have capital (Paris), need weather.", action: weather("Paris"))
Observation: Recorded (observation: "Sunny, 25°C")
(Step 3)
Think/Action: Recorded (thought: "Have capital and weather. Task complete.", action: final_answer("The capital of France is Paris, and the current weather there is Sunny, 25°C."))
Observation: Recorded (Result of final_answer is the final output).
(End of Run)
Now, before Step 2 started, the agent would read entries 1-4 from memory to give context to the LLM. Before Step 3, it would read entries 1-6. This prevents the agent from forgetting what it’s doing!
Under the Hood: Memory Structure
How does SmolaAgents actually implement this?
Core Idea: The AgentMemory object holds a list called steps. Each item in this list represents one distinct event or phase in the agent’s run. These items are usually instances of specific “Step” classes.
Key Step Types (Simplified from memory.py):
SystemPromptStep: Stores the initial system prompt text.
TaskStep: Stores the user’s task description (and potentially input images).
PlanningStep (Optional): Stores any explicit plans the agent generates.
ActionStep: This is the most common one, recording a single Think-Act-Observe cycle. It contains fields for:
step_number
model_input_messages: What was sent to the LLM for thinking.
model_output_message: The LLM’s raw response (thought + action plan).
tool_calls: Which Tool was called (name, arguments). Stored as ToolCall objects.
FinalAnswerStep: A special step indicating the final result returned by the agent.
Interaction Flow:
Here’s how the MultiStepAgent uses AgentMemory:
Code Glimpse (Simplified):
Let’s look at some relevant pieces from memory.py and agents.py.
Memory Step Dataclasses (memory.py): Define the structure of log entries.
# --- File: memory.py (Simplified Step Structures) ---
fromdataclassesimportdataclassfromtypingimportList,Any,Dict@dataclassclassToolCall:# Represents a tool invocation request
name:strarguments:Anyid:str# Unique ID for matching responses
@dataclassclassMemoryStep:# Base class for all memory entries
defto_messages(self,**kwargs)->List[Dict[str,Any]]:# Each step type knows how to format itself for the LLM
raiseNotImplementedError@dataclassclassTaskStep(MemoryStep):task:str# ... (potentially images)
defto_messages(self,**kwargs)->List[Dict[str,Any]]:# Format: {"role": "user", "content": [{"type": "text", "text": "New task: ..."}]}
# ... simplified ...
return[{"role":"user","content":f"New task:\n{self.task}"}]@dataclassclassActionStep(MemoryStep):step_number:int# model_input_messages: List = None # What was sent to LLM
model_output:str|None=None# LLM's thought/action text
tool_calls:List[ToolCall]|None=None# Parsed tool calls
observations:str|None=None# Tool results or code output
error:Any|None=None# Any error encountered
# ... other fields like timing ...
defto_messages(self,**kwargs)->List[Dict[str,Any]]:# Formats the LLM output, tool calls, observations/errors
# into messages for the next LLM call.
messages=[]ifself.model_output:messages.append({"role":"assistant","content":self.model_output})ifself.tool_calls:# Simplified representation
messages.append({"role":"tool_call","content":f"Calling: {self.tool_calls[0].name}(...)"})ifself.observations:messages.append({"role":"tool_response","content":f"Observation:\n{self.observations}"})ifself.error:messages.append({"role":"tool_response","content":f"Error:\n{self.error}"})returnmessages# ... potentially other step types like SystemPromptStep, PlanningStep ...
AgentMemory Class (memory.py): Holds the list of steps.
# --- File: memory.py (Simplified AgentMemory) ---
fromtypingimportList,Union@dataclassclassSystemPromptStep(MemoryStep):# Simplified
system_prompt:strdefto_messages(self,**kwargs):# Simplified
return[{"role":"system","content":self.system_prompt}]classAgentMemory:def__init__(self,system_prompt:str):# Initialize with the system prompt
self.system_prompt=SystemPromptStep(system_prompt=system_prompt)# The main logbook - a list of steps taken
self.steps:List[Union[TaskStep,ActionStep,PlanningStep]]=[]defreset(self):"""Clears the memory for a new run."""self.steps=[]defreplay(self,logger,detailed:bool=False):"""Utility to print the memory steps nicely."""# ... implementation uses logger to print each step ...
pass
Agent Using Memory (agents.py): How MultiStepAgent reads and writes.
# --- File: agents.py (Simplified MultiStepAgent interactions) ---
from.memoryimportAgentMemory,TaskStep,ActionStep,ToolCall# Import memory components
classMultiStepAgent:def__init__(self,...,memory:Optional[AgentMemory]=None):# ... setup model, tools ...
self.system_prompt=self.initialize_system_prompt()# Define system prompt
# Create the memory instance
self.memory=memoryifmemoryisnotNoneelseAgentMemory(self.system_prompt)# ... setup logger, monitor ...
defrun(self,task:str,...):# ... setup ...
ifreset:# Option to clear memory before a new run
self.memory.reset()# Record the initial task in memory
self.memory.steps.append(TaskStep(task=self.task))# Start the internal execution loop (_run)
# ... calls _run ...
final_result=# ... get result from _run ...
returnfinal_resultdef_run(self,task:str,max_steps:int,...)->Generator:# ... loop initialization ...
whilefinal_answerisNoneandself.step_number<=max_steps:# ... (handle planning steps if enabled) ...
# Create a placeholder for the current step's data
action_step=self._create_action_step(...)try:# === Execute one step (Think -> Act -> Observe) ===
# This method internally calls write_memory_to_messages,
# calls the model, executes the tool, and populates
# the 'action_step' object with results.
final_answer=self._execute_step(task,action_step)exceptAgentErrorase:# Record errors in the memory step
action_step.error=efinally:# Finalize timing etc. for the step
self._finalize_step(action_step,...)# === Store the completed step in memory ===
self.memory.steps.append(action_step)# ... yield step details ...
self.step_number+=1# ... handle finish ...
yieldFinalAnswerStep(final_answer)defwrite_memory_to_messages(self,summary_mode:Optional[bool]=False)->List[Dict[str,str]]:"""
Reads history from memory and formats it for the LLM.
"""messages=self.memory.system_prompt.to_messages(summary_mode=summary_mode)# Go through each step recorded in memory
formemory_stepinself.memory.steps:# Ask each step to format itself into messages
messages.extend(memory_step.to_messages(summary_mode=summary_mode))returnmessagesdef_execute_step(self,task:str,memory_step:ActionStep)->Union[None,Any]:self.logger.log_rule(f"Step {self.step_number}",level=LogLevel.INFO)# === THINK ===
# 1. Get history from memory
messages_for_llm=self.write_memory_to_messages()memory_step.model_input_messages=messages_for_llm# Record input to LLM
# 2. Call the LLM brain
llm_response=self.model(messages_for_llm,...)# Call Model Interface
memory_step.model_output_message=llm_response# Record LLM response
# 3. Parse LLM response for action
# (Specific parsing logic depends on AgentType - ToolCallingAgent, CodeAgent)
tool_name,arguments=self._parse_action(llm_response)# Simplified
memory_step.tool_calls=[ToolCall(name=tool_name,arguments=arguments,id=...)]# === ACT & OBSERVE ===
# 4. Execute the action (tool call or code)
observation=self._execute_action(tool_name,arguments)# Simplified
# 5. Record observation
memory_step.observations=str(observation)# 6. Check if it's the final answer
iftool_name=="final_answer":returnobservation# Return the final answer to stop the loop
else:returnNone# Continue to the next step
# ... other methods like _create_action_step, _finalize_step ...
Key Takeaways from Code:
Memory holds a list of Step objects (self.memory.steps).
The agent adds new TaskStep or ActionStep objects to this list as it progresses (self.memory.steps.append(...)).
Before calling the LLM, write_memory_to_messages iterates through self.memory.steps, calling to_messages() on each step to build the history.
Each step (like ActionStep) stores details like the LLM’s output (model_output), tool calls (tool_calls), and results (observations or error).
Conclusion
AgentMemory is the agent’s essential logbook, providing the context needed to navigate complex, multi-step tasks. It diligently records the initial task, system instructions, and every action, observation, and error along the way.
You’ve learned:
Why memory is crucial for agents (avoiding amnesia).
The “ship’s logbook” analogy.
What kind of information AgentMemory stores (task, system prompt, steps with thoughts, actions, observations, errors).
How the MultiStepAgent uses memory automatically: reading history before thinking, and writing results after acting/observing.
The basic structure of AgentMemory and its Step objects (TaskStep, ActionStep).
While you often don’t need to manipulate memory directly, understanding its role is key to understanding how agents maintain context and achieve complex goals. The content of this memory directly influences the prompts sent to the LLM. How can we customize those prompts? Let’s find out!