Chapter 4: Tool / ToolCollection - Giving Your Agent Skills
In Chapter 3: BaseAgent - The Agent Blueprint, we learned how BaseAgent
provides the standard structure for our agents, including a brain (LLM) and memory (Message / Memory). But what if we want our agent to do more than just think and remember? What if we want it to act in the world – like searching the web, running code, or editing files?
This is where Tools come in!
What Problem Do They Solve?
Imagine an agent trying to answer the question: “What’s the weather like in Tokyo right now?”
The agent’s LLM brain has a lot of general knowledge, but it doesn’t have real-time access to the internet. It can’t check the current weather. It needs a specific capability or skill to do that.
Similarly, if you ask an agent to “Write a python script that prints ‘hello world’ and save it to a file named hello.py
,” the agent needs the ability to:
- Understand the request (using its LLM).
- Write the code (using its LLM).
- Actually execute code to create and write to a file.
Steps 1 and 2 are handled by the LLM, but step 3 requires interacting with the computer’s file system – something the LLM can’t do directly.
Tools give agents these specific, actionable skills. A ToolCollection
organizes these skills so the agent knows what it can do.
Use Case: Let’s build towards an agent that can:
- Search the web for today’s date.
- Tell the user the date.
This agent needs a “Web Search” tool.
Key Concepts: Tools and Toolboxes
Let’s break down the two main ideas:
1. BaseTool
: The Blueprint for a Skill
Think of BaseTool
(app/tool/base.py
) as the template or design specification for any tool. It doesn’t do anything itself, but it defines what every tool needs to have:
name
(str): A short, descriptive name for the tool (e.g.,web_search
,file_writer
,code_runner
). This is how the agent (or LLM) identifies the tool.description
(str): A clear explanation of what the tool does, what it’s good for, and when to use it. This is crucial for the LLM to decide which tool to use for a given task.parameters
(dict): A definition of the inputs the tool expects. For example, aweb_search
tool needs aquery
input, and afile_writer
needs apath
andcontent
. This is defined using a standard format called JSON Schema.execute
method: An abstract method. This meansBaseTool
says “every tool must have an execute method”, but each specific tool needs to provide its own instructions for how to actually perform the action.
You almost never use BaseTool
directly. You use it as a starting point to create actual, usable tools.
2. Concrete Tools: The Actual Skills
These are specific classes that inherit from BaseTool
and provide the real implementation for the execute
method. OpenManus comes with several pre-built tools:
WebSearch
(app/tool/web_search.py
): Searches the web using engines like Google, Bing, etc.Bash
(app/tool/bash.py
): Executes shell commands (likels
,pwd
,python script.py
).StrReplaceEditor
(app/tool/str_replace_editor.py
): Views, creates, and edits files by replacing text.BrowserUseTool
(app/tool/browser_use_tool.py
): Interacts with web pages like a user (clicking, filling forms, etc.).Terminate
(app/tool/terminate.py
): A special tool used by agents to signal they have finished their task.
Each of these defines its specific name
, description
, parameters
, and implements the execute
method to perform its unique action.
3. ToolCollection
: The Agent’s Toolbox
Think of a handyman. They don’t just carry one tool; they have a toolbox filled with hammers, screwdrivers, wrenches, etc.
A ToolCollection
(app/tool/tool_collection.py
) is like that toolbox for an agent.
- It holds a list of specific tool instances (like
WebSearch
,Bash
). - It allows the agent (and its LLM) to see all the available tools and their descriptions.
- It provides a way to execute a specific tool by its name.
When an agent needs to perform an action, its LLM can look at the ToolCollection
, read the descriptions of the available tools, choose the best one for the job, figure out the necessary inputs based on the tool’s parameters
, and then ask the ToolCollection
to execute that tool with those inputs.
How Do We Use Them?
Let’s see how we can equip an agent with a simple tool. We’ll create a basic “EchoTool” first.
1. Creating a Concrete Tool (Inheriting from BaseTool
):
# Import the necessary base class
from app.tool.base import BaseTool, ToolResult
# Define our simple tool
class EchoTool(BaseTool):
"""A simple tool that echoes the input text."""
name: str = "echo_message"
description: str = "Repeats back the text provided in the 'message' parameter."
parameters: dict = {
"type": "object",
"properties": {
"message": {
"type": "string",
"description": "The text to be echoed back.",
},
},
"required": ["message"], # Tells the LLM 'message' must be provided
}
# Implement the actual action
async def execute(self, message: str) -> ToolResult:
"""Takes a message and returns it."""
print(f"EchoTool executing with message: '{message}'")
# ToolResult is a standard way to return tool output
return ToolResult(output=f"You said: {message}")
# Create an instance of our tool
echo_tool_instance = EchoTool()
print(f"Tool Name: {echo_tool_instance.name}")
print(f"Tool Description: {echo_tool_instance.description}")
Explanation:
- We import
BaseTool
andToolResult
(a standard object for wrapping tool outputs). class EchoTool(BaseTool):
declares that ourEchoTool
is a type ofBaseTool
.- We define the
name
,description
, andparameters
according to theBaseTool
template. Theparameters
structure tells the LLM what input is expected (message
as a string) and that it’s required. - We implement
async def execute(self, message: str) -> ToolResult:
. This is the specific logic for our tool. It takes themessage
input and returns it wrapped in aToolResult
.
Example Output:
Tool Name: echo_message
Tool Description: Repeats back the text provided in the 'message' parameter.
2. Creating a ToolCollection:
Now, let’s put our EchoTool
and the built-in WebSearch
tool into a toolbox.
# Import ToolCollection and the tools we want
from app.tool import ToolCollection, WebSearch
# Assume EchoTool class is defined as above
# from your_module import EchoTool # Or wherever EchoTool is defined
# Create instances of the tools
echo_tool = EchoTool()
web_search_tool = WebSearch() # Uses default settings
# Create a ToolCollection containing these tools
my_toolbox = ToolCollection(echo_tool, web_search_tool)
# See the names of the tools in the collection
tool_names = [tool.name for tool in my_toolbox]
print(f"Tools in the toolbox: {tool_names}")
# Get the parameters needed for the LLM
tool_params_for_llm = my_toolbox.to_params()
print(f"\nParameters for LLM (showing first tool):")
import json
print(json.dumps(tool_params_for_llm[0], indent=2))
Explanation:
- We import
ToolCollection
and the specific tools (WebSearch
,EchoTool
). - We create instances of the tools we need.
my_toolbox = ToolCollection(echo_tool, web_search_tool)
creates the collection, holding our tool instances.- We can access the tools inside using
my_toolbox.tools
or iterate overmy_toolbox
. my_toolbox.to_params()
is a crucial method. It formats thename
,description
, andparameters
of all tools in the collection into a list of dictionaries. This specific format is exactly what the agent’s LLM needs (when using itsask_tool
method) to understand which tools are available and how to use them.
Example Output:
Tools in the toolbox: ['echo_message', 'web_search']
Parameters for LLM (showing first tool):
{
"type": "function",
"function": {
"name": "echo_message",
"description": "Repeats back the text provided in the 'message' parameter.",
"parameters": {
"type": "object",
"properties": {
"message": {
"type": "string",
"description": "The text to be echoed back."
}
},
"required": [
"message"
]
}
}
}
3. Agent Using the ToolCollection:
Now, how does an agent like ToolCallAgent
(a specific type of BaseAgent) use this?
Conceptually (the real agent code is more complex):
- The agent is configured with a
ToolCollection
(likemy_toolbox
). - When the agent needs to figure out the next step, it calls its LLM’s
ask_tool
method. - It passes the conversation history (Message / Memory) AND the output of
my_toolbox.to_params()
to the LLM. - The LLM looks at the conversation and the list of available tools (from
to_params()
). It reads thedescription
of each tool to understand what it does. - If the LLM decides a tool is needed (e.g., the user asked “What’s today’s date?”, the LLM sees the
web_search
tool is available and appropriate), it will generate a special response indicating:- The
name
of the tool to use (e.g.,"web_search"
). - The
arguments
(inputs) for the tool, based on itsparameters
(e.g.,{"query": "today's date"}
).
- The
- The agent receives this response from the LLM.
- The agent then uses the
ToolCollection
’sexecute
method:await my_toolbox.execute(name="web_search", tool_input={"query": "today's date"})
. - The
ToolCollection
finds theWebSearch
tool instance in its internaltool_map
and calls itsexecute
method with the provided input. - The
WebSearch
tool runs, performs the actual web search, and returns the results (as aToolResult
or similar). - The agent takes this result, formats it as a
tool
message, adds it to its memory, and continues its thinking process (often asking the LLM again, now with the tool’s result as context).
The ToolCollection
acts as the crucial bridge between the LLM’s decision to use a tool and the actual execution of that tool’s code.
Under the Hood: How ToolCollection.execute
Works
Let’s trace the flow when an agent asks its ToolCollection
to run a tool:
sequenceDiagram
participant Agent as ToolCallAgent
participant LLM as LLM (Deciding Step)
participant Toolbox as ToolCollection
participant SpecificTool as e.g., WebSearch Tool
Agent->>+LLM: ask_tool(messages, tools=Toolbox.to_params())
LLM->>LLM: Analyzes messages & available tools
LLM-->>-Agent: Response indicating tool call: name='web_search', arguments={'query': '...'}
Agent->>+Toolbox: execute(name='web_search', tool_input={'query': '...'})
Toolbox->>Toolbox: Look up 'web_search' in internal tool_map
Note right of Toolbox: Finds the WebSearch instance
Toolbox->>+SpecificTool: Calls execute(**tool_input) on the found tool
SpecificTool->>SpecificTool: Performs actual web search action
SpecificTool-->>-Toolbox: Returns ToolResult (output="...", error=None)
Toolbox-->>-Agent: Returns the ToolResult
Agent->>Agent: Processes the result (adds to memory, etc.)
Code Glimpse:
Let’s look at the ToolCollection
itself in app/tool/tool_collection.py
:
# Simplified snippet from app/tool/tool_collection.py
from typing import Any, Dict, List, Tuple
from app.tool.base import BaseTool, ToolResult, ToolFailure
from app.exceptions import ToolError
class ToolCollection:
# ... (Config class) ...
tools: Tuple[BaseTool, ...] # Holds the tool instances
tool_map: Dict[str, BaseTool] # Maps name to tool instance for quick lookup
def __init__(self, *tools: BaseTool):
"""Initializes with a sequence of tools."""
self.tools = tools
# Create the map for easy lookup by name
self.tool_map = {tool.name: tool for tool in tools}
def to_params(self) -> List[Dict[str, Any]]:
"""Formats tools for the LLM API."""
# Calls the 'to_param()' method on each tool
return [tool.to_param() for tool in self.tools]
async def execute(
self, *, name: str, tool_input: Dict[str, Any] = None
) -> ToolResult:
"""Finds a tool by name and executes it."""
# 1. Find the tool instance using the name
tool = self.tool_map.get(name)
if not tool:
# Return a standard failure result if tool not found
return ToolFailure(error=f"Tool {name} is invalid")
# 2. Execute the tool's specific method
try:
# The 'tool(**tool_input)' calls the tool instance's __call__ method,
# which in BaseTool, calls the tool's 'execute' method.
# The ** unpacks the dictionary into keyword arguments.
result = await tool(**(tool_input or {}))
# Ensure the result is a ToolResult (or subclass)
return result if isinstance(result, ToolResult) else ToolResult(output=str(result))
except ToolError as e:
# Handle errors specific to tools
return ToolFailure(error=e.message)
except Exception as e:
# Handle unexpected errors during execution
return ToolFailure(error=f"Unexpected error executing tool {name}: {e}")
# ... other methods like add_tool, __iter__ ...
Explanation:
- The
__init__
method takes tool instances and stores them inself.tools
(a tuple) andself.tool_map
(a dictionary mapping name to instance). to_params
iterates throughself.tools
and calls each tool’sto_param()
method (defined inBaseTool
) to get the LLM-compatible format.execute
is the core method used by agents:- It uses
self.tool_map.get(name)
to quickly find the correct tool instance based on the requested name. - If found, it calls
await tool(**(tool_input or {}))
. The**
unpacks thetool_input
dictionary into keyword arguments for the tool’sexecute
method (e.g.,message="hello"
for ourEchoTool
, orquery="today's date"
forWebSearch
). - It wraps the execution in
try...except
blocks to catch errors and return a standardizedToolFailure
result if anything goes wrong.
- It uses
Wrapping Up Chapter 4
We’ve learned how Tools give agents specific skills beyond basic language understanding.
BaseTool
is the abstract blueprint defining a tool’sname
,description
, and expectedparameters
.- Concrete tools (like
WebSearch
,Bash
, or our customEchoTool
) inherit fromBaseTool
and implement the actualexecute
logic. ToolCollection
acts as the agent’s toolbox, holding various tools and providing methods (to_params
,execute
) for the agent (often guided by its LLM) to discover and use these capabilities.
With tools, agents can interact with external systems, run code, access real-time data, and perform complex actions, making them much more powerful.
But how do we coordinate multiple agents, potentially using different tools, to work together on a larger task? That’s where Flows come in.
Let’s move on to Chapter 5: BaseFlow to see how we orchestrate complex workflows involving multiple agents and steps.
Generated by AI Codebase Knowledge Builder