Chapter 3: Tool - Giving Your Agent Superpowers
Welcome back! In Chapter 2: Model Interface, we learned how our MultiStepAgent
uses a “universal remote” (the Model Interface) to talk to its LLM “brain”. The LLM thinks and suggests what the agent should do next.
But how does the agent actually do things? If the LLM suggests “Search the web for the capital of France,” how does the agent perform the search? It can’t just magically type into Google!
This is where Tools come in. They are the agent’s hands and specialized equipment, allowing it to interact with the world beyond just generating text.
The Problem: An Agent Trapped in its Mind
Imagine a brilliant chef who only knows recipes but is locked in an empty room. They can tell you exactly how to make a perfect soufflé, step-by-step, but they can’t actually do any of it. They have no ingredients, no oven, no whisk, no bowls. They’re stuck!
🤔 -> 📝 Recipe (Think)
An agent without tools is like that chef. The LLM brain can reason and plan (“I need to search the web”), but the agent itself has no way to execute that plan (“How do I actually search?”).
The Solution: The Agent’s Toolbox
Tools are specific capabilities we give to our agent. Think of them like the utensils and appliances in a kitchen drawer:
- Peeler: Used for peeling vegetables.
- Whisk: Used for mixing ingredients.
- Oven: Used for baking.
- Search Engine Tool: Used for searching the web.
- Calculator Tool: Used for performing calculations.
- Code Execution Tool: Used for running computer code.
-> 🔎 Search, 💻 Code Runner, ☁️ Weather API
Each tool is a reusable function that the agent can call upon to perform a specific action. The agent acts like the chef, looking at the next step in the recipe (the LLM’s suggestion) and picking the right tool from its toolbox.
What Makes a Tool?
Every tool in SmolaAgents
needs a few key pieces of information so the agent (and the LLM helping it) can understand it:
name
: A short, descriptive name for the tool (e.g.,web_search
,calculator
). This is how the agent identifies which tool to use.description
: A clear explanation of what the tool does, what it’s good for, and what information it needs. This helps the LLM decide when to suggest using this tool. Example: “Performs a web search using DuckDuckGo and returns the top results.”inputs
: Defines what information the tool needs to do its job. This is like specifying that a peeler needs a vegetable, or a calculator needs numbers and an operation. It’s defined as a dictionary where keys are argument names and values describe the type and purpose. Example:{"query": {"type": "string", "description": "The search query"}}
.output_type
: Describes the type of result the tool will return (e.g.,string
,number
,image
).forward
method: This is the actual Python code that gets executed when the tool is used. It takes the definedinputs
as arguments and performs the tool’s action, returning the result.
Creating Your First Tool: The GreetingTool
Let’s build a very simple tool. Imagine we want our agent to be able to greet someone by name.
We’ll create a GreetingTool
by inheriting from the base Tool
class provided by SmolaAgents
.
# --- File: simple_tools.py ---
from smolagents import Tool # Import the base class
class GreetingTool(Tool):
"""A simple tool that generates a greeting."""
# 1. Give it a unique name
name: str = "greet_person"
# 2. Describe what it does clearly
description: str = "Greets a person by their name."
# 3. Define the inputs it needs
# It needs one input: the 'name' of the person, which should be a string.
inputs: dict = {
"name": {
"type": "string",
"description": "The name of the person to greet."
}
}
# 4. Specify the type of the output
# It will return the greeting as a string.
output_type: str = "string"
# 5. Implement the action in the 'forward' method
def forward(self, name: str) -> str:
"""The actual code that runs when the tool is called."""
print(f"--- GreetingTool activated with name: {name} ---")
greeting = f"Hello, {name}! Nice to meet you."
return greeting
# Let's test it quickly (outside the agent context)
greeter = GreetingTool()
result = greeter(name="Alice") # Calling the tool instance
print(f"Tool returned: '{result}'")
# Expected Output:
# --- GreetingTool activated with name: Alice ---
# Tool returned: 'Hello, Alice! Nice to meet you.'
Explanation:
- Import: We import the base
Tool
class. - Class Definition: We define
GreetingTool
inheriting fromTool
. - Attributes: We set the required class attributes:
name
,description
,inputs
, andoutput_type
. These tell the agent everything it needs to know about the tool without running it. forward
Method: This method contains the core logic. It takes thename
(defined ininputs
) as an argument and returns the greeting string. We added aprint
statement just to see when it runs.- Testing: We create an instance
greeter
and call it like a function, passing the required argumentname="Alice"
. It executes theforward
method and returns the result.
This GreetingTool
is now ready to be added to an agent’s toolbox!
Adding the Tool to Your Agent
Remember how we created our MultiStepAgent
in Chapter 1? We gave it a model and a list of tools. Let’s add our new GreetingTool
:
# --- File: agent_with_greeting.py ---
# (Assuming GreetingTool is defined as above or imported)
# from simple_tools import GreetingTool
from smolagents import MultiStepAgent
from smolagents.models import LiteLLMModel # From Chapter 2
# Potentially other tools like SearchTool etc.
# 1. Create an instance of our new tool
greeting_tool = GreetingTool()
# 2. Create instances of any other tools the agent might need
# search_tool = SearchTool() # Example from Chapter 1
# 3. Choose a language model (the "brain")
llm = LiteLLMModel(model_id="gpt-3.5-turbo") # Needs API key setup
# 4. Create the MultiStepAgent, passing the tool(s) in a list
agent = MultiStepAgent(
model=llm,
tools=[greeting_tool] # Add our tool here! Maybe add search_tool too?
# tools=[greeting_tool, search_tool]
)
print("Agent created with GreetingTool!")
# 5. Give the agent a task that might use the tool
task = "Greet the user named Bob."
print(f"Running agent with task: '{task}'")
# The agent will now start its Think-Act-Observe cycle...
final_answer = agent.run(task)
print("-" * 20)
print(f"Final Answer received: {final_answer}")
# --- Expected Interaction (Simplified) ---
# Agent (thinks): The task is to greet Bob. I have a 'greet_person' tool.
# Agent (acts): Use tool 'greet_person' with input name="Bob".
# --- GreetingTool activated with name: Bob --- (Our print statement)
# Agent (observes): Tool returned "Hello, Bob! Nice to meet you."
# Agent (thinks): I have the greeting. That completes the task.
# Agent (acts): Use 'final_answer' tool with "Hello, Bob! Nice to meet you."
# --------------------
# Final Answer received: Hello, Bob! Nice to meet you.
Explanation:
- We create an instance of
GreetingTool
. - We put this instance into the
tools
list when initializingMultiStepAgent
. - The agent now “knows” about the
greet_person
tool, its description, and how to use it (via itsname
andinputs
). - When we run the
agent
with the task “Greet the user named Bob,” the LLM (using the tool descriptions provided in the prompt) will likely recognize that thegreet_person
tool is perfect for this. - The agent will then execute the
greeting_tool.forward(name="Bob")
method during its “Act” phase.
How the Agent Uses Tools: Under the Hood
Let’s revisit the Think -> Act -> Observe cycle from Chapter 1 and see exactly where tools fit in.
- Think: The agent gathers its history (AgentMemory) and the available tool descriptions. It sends this to the LLM via the Model Interface asking, “What should I do next to accomplish the task ‘Greet Bob’?” The LLM, seeing the
greet_person
tool description, might respond with something like:{ "thought": "The user wants me to greet Bob. I should use the 'greet_person' tool.", "action": "greet_person", "action_input": {"name": "Bob"} }
(Note: The exact format depends on the agent type and model. Some models use explicit tool-calling formats like the one shown in Chapter 2’s
ToolCallingAgent
example output). - Act: The
MultiStepAgent
receives this response.- It parses the response to identify the intended
action
(greet_person
) and theaction_input
({"name": "Bob"}
). - It looks up the tool named
greet_person
in itsself.tools
dictionary. - It calls the
forward
method of that tool instance, passing the arguments fromaction_input
. In our case:greeting_tool.forward(name="Bob")
. - This executes our Python code inside the
forward
method.
- It parses the response to identify the intended
- Observe: The agent captures the return value from the
forward
method (e.g.,"Hello, Bob! Nice to meet you."
). This becomes the “observation” for this step.- This observation is stored in the AgentMemory.
- The cycle repeats: The agent thinks again, now considering the result of the greeting tool. It likely decides the task is complete and uses the built-in
final_answer
tool.
Here’s a simplified diagram:
sequenceDiagram
participant Agent as MultiStepAgent
participant LLM as LLM Brain
participant GreetTool as GreetingTool
Agent->>LLM: Task: Greet Bob. Tools: [greet_person]. What next?
LLM-->>Agent: Use tool 'greet_person' with name='Bob'
Agent->>GreetTool: forward(name="Bob")
GreetTool-->>Agent: "Hello, Bob! Nice to meet you." (Observation)
Agent->>LLM: Observation: "Hello, Bob!..." Task done?
LLM-->>Agent: Use tool 'final_answer' with "Hello, Bob!..."
Agent-->>User: "Hello, Bob! Nice to meet you."
Code Glimpse (Simplified execute_tool_call
):
Inside the agents.py
file (specifically within agent types like ToolCallingAgent
), there’s logic similar to this (heavily simplified):
# --- Simplified concept from agents.py ---
class SomeAgentType(MultiStepAgent):
# ... other methods ...
def execute_tool_call(self, tool_name: str, arguments: dict) -> Any:
# Find the tool in the agent's toolbox
if tool_name in self.tools:
tool_instance = self.tools[tool_name]
try:
# Call the tool's forward method with the arguments!
# This is where GreetingTool.forward(name="Bob") happens.
result = tool_instance(**arguments) # Uses ** to unpack the dict
return result
except Exception as e:
# Handle errors if the tool fails
print(f"Error executing tool {tool_name}: {e}")
return f"Error: Tool {tool_name} failed."
# ... handle case where tool_name is not found ...
elif tool_name == "final_answer":
# Special handling for the final answer
return arguments.get("answer", arguments) # Return the final answer content
else:
return f"Error: Unknown tool {tool_name}."
def step(self, memory_step: ActionStep):
# ... (Agent thinks and gets LLM response) ...
llm_response = # ... result from self.model(...) ...
if llm_response suggests a tool call:
tool_name = # ... parse tool name from response ...
arguments = # ... parse arguments from response ...
# === ACT ===
observation = self.execute_tool_call(tool_name, arguments)
memory_step.observations = str(observation) # Store observation
if tool_name == "final_answer":
return observation # Signal that this is the final answer
# ... (handle cases where LLM gives text instead of tool call) ...
return None # Not the final answer yet
This shows the core idea: the agent gets the tool_name
and arguments
from the LLM, finds the corresponding Tool
object, and calls its forward
method using the arguments.
Common Built-in Tools
SmolaAgents
comes with several useful tools ready to use (found in default_tools.py
):
DuckDuckGoSearchTool
(web_search
): Searches the web using DuckDuckGo.PythonInterpreterTool
(python_interpreter
): Executes Python code snippets safely. Very powerful for calculations, data manipulation, etc. (Used primarily byCodeAgent
, see Chapter 6: PythonExecutor).VisitWebpageTool
(visit_webpage
): Fetches the content of a webpage URL.FinalAnswerTool
(final_answer
): A special, essential tool. The agent uses this only when it believes it has completed the task and has the final result. Calling this tool usually ends the agent’s run. It’s automatically added to every agent.
You can import and use these just like we used our GreetingTool
:
from smolagents.tools import DuckDuckGoSearchTool, FinalAnswerTool # FinalAnswerTool is usually added automatically
search_tool = DuckDuckGoSearchTool()
# calculator_tool = PythonInterpreterTool() # Often used internally by CodeAgent
agent = MultiStepAgent(
model=llm,
tools=[search_tool] # Agent can now search!
)
Conclusion
Tools are the bridge between an agent’s reasoning and the real world (or specific functionalities like code execution). They are reusable capabilities defined by their name
, description
, inputs
, output_type
, and the core logic in their forward
method.
You’ve learned:
- Why agents need tools (like a chef needs utensils).
- The essential components of a
Tool
inSmolaAgents
. - How to create a simple custom tool (
GreetingTool
). - How to give tools to your
MultiStepAgent
. - How the agent uses the LLM’s suggestions to select and execute the correct tool during the “Act” phase.
- About some common built-in tools.
By equipping your agent with the right set of tools, you dramatically expand the range of tasks it can accomplish! But as the agent takes multiple steps, using tools and getting results, how does it keep track of everything that has happened? That’s where memory comes in.
Next Chapter: Chapter 4: AgentMemory - The Agent’s Notepad.
Generated by AI Codebase Knowledge Builder