Chapter 4: Tool / ToolCollection - Giving Your Agent Skills
In Chapter 3: BaseAgent - The Agent Blueprint, we learned how BaseAgent provides the standard structure for our agents, including a brain (LLM) and memory (Message / Memory). But what if we want our agent to do more than just think and remember? What if we want it to act in the world – like searching the web, running code, or editing files?
This is where Tools come in!
What Problem Do They Solve?
Imagine an agent trying to answer the question: “What’s the weather like in Tokyo right now?”
The agent’s LLM brain has a lot of general knowledge, but it doesn’t have real-time access to the internet. It can’t check the current weather. It needs a specific capability or skill to do that.
Similarly, if you ask an agent to “Write a python script that prints ‘hello world’ and save it to a file named hello.py,” the agent needs the ability to:
- Understand the request (using its LLM).
- Write the code (using its LLM).
- Actually execute code to create and write to a file.
Steps 1 and 2 are handled by the LLM, but step 3 requires interacting with the computer’s file system – something the LLM can’t do directly.
Tools give agents these specific, actionable skills. A ToolCollection organizes these skills so the agent knows what it can do.
Use Case: Let’s build towards an agent that can:
- Search the web for today’s date.
- Tell the user the date.
This agent needs a “Web Search” tool.
Key Concepts: Tools and Toolboxes
Let’s break down the two main ideas:
1. BaseTool: The Blueprint for a Skill
Think of BaseTool (app/tool/base.py) as the template or design specification for any tool. It doesn’t do anything itself, but it defines what every tool needs to have:
name(str): A short, descriptive name for the tool (e.g.,web_search,file_writer,code_runner). This is how the agent (or LLM) identifies the tool.description(str): A clear explanation of what the tool does, what it’s good for, and when to use it. This is crucial for the LLM to decide which tool to use for a given task.parameters(dict): A definition of the inputs the tool expects. For example, aweb_searchtool needs aqueryinput, and afile_writerneeds apathandcontent. This is defined using a standard format called JSON Schema.executemethod: An abstract method. This meansBaseToolsays “every tool must have an execute method”, but each specific tool needs to provide its own instructions for how to actually perform the action.
You almost never use BaseTool directly. You use it as a starting point to create actual, usable tools.
2. Concrete Tools: The Actual Skills
These are specific classes that inherit from BaseTool and provide the real implementation for the execute method. OpenManus comes with several pre-built tools:
WebSearch(app/tool/web_search.py): Searches the web using engines like Google, Bing, etc.Bash(app/tool/bash.py): Executes shell commands (likels,pwd,python script.py).StrReplaceEditor(app/tool/str_replace_editor.py): Views, creates, and edits files by replacing text.BrowserUseTool(app/tool/browser_use_tool.py): Interacts with web pages like a user (clicking, filling forms, etc.).Terminate(app/tool/terminate.py): A special tool used by agents to signal they have finished their task.
Each of these defines its specific name, description, parameters, and implements the execute method to perform its unique action.
3. ToolCollection: The Agent’s Toolbox
Think of a handyman. They don’t just carry one tool; they have a toolbox filled with hammers, screwdrivers, wrenches, etc.
A ToolCollection (app/tool/tool_collection.py) is like that toolbox for an agent.
- It holds a list of specific tool instances (like
WebSearch,Bash). - It allows the agent (and its LLM) to see all the available tools and their descriptions.
- It provides a way to execute a specific tool by its name.
When an agent needs to perform an action, its LLM can look at the ToolCollection, read the descriptions of the available tools, choose the best one for the job, figure out the necessary inputs based on the tool’s parameters, and then ask the ToolCollection to execute that tool with those inputs.
How Do We Use Them?
Let’s see how we can equip an agent with a simple tool. We’ll create a basic “EchoTool” first.
1. Creating a Concrete Tool (Inheriting from BaseTool):
# Import the necessary base class
from app.tool.base import BaseTool, ToolResult
# Define our simple tool
class EchoTool(BaseTool):
"""A simple tool that echoes the input text."""
name: str = "echo_message"
description: str = "Repeats back the text provided in the 'message' parameter."
parameters: dict = {
"type": "object",
"properties": {
"message": {
"type": "string",
"description": "The text to be echoed back.",
},
},
"required": ["message"], # Tells the LLM 'message' must be provided
}
# Implement the actual action
async def execute(self, message: str) -> ToolResult:
"""Takes a message and returns it."""
print(f"EchoTool executing with message: '{message}'")
# ToolResult is a standard way to return tool output
return ToolResult(output=f"You said: {message}")
# Create an instance of our tool
echo_tool_instance = EchoTool()
print(f"Tool Name: {echo_tool_instance.name}")
print(f"Tool Description: {echo_tool_instance.description}")
Explanation:
- We import
BaseToolandToolResult(a standard object for wrapping tool outputs). class EchoTool(BaseTool):declares that ourEchoToolis a type ofBaseTool.- We define the
name,description, andparametersaccording to theBaseTooltemplate. Theparametersstructure tells the LLM what input is expected (messageas a string) and that it’s required. - We implement
async def execute(self, message: str) -> ToolResult:. This is the specific logic for our tool. It takes themessageinput and returns it wrapped in aToolResult.
Example Output:
Tool Name: echo_message
Tool Description: Repeats back the text provided in the 'message' parameter.
2. Creating a ToolCollection:
Now, let’s put our EchoTool and the built-in WebSearch tool into a toolbox.
# Import ToolCollection and the tools we want
from app.tool import ToolCollection, WebSearch
# Assume EchoTool class is defined as above
# from your_module import EchoTool # Or wherever EchoTool is defined
# Create instances of the tools
echo_tool = EchoTool()
web_search_tool = WebSearch() # Uses default settings
# Create a ToolCollection containing these tools
my_toolbox = ToolCollection(echo_tool, web_search_tool)
# See the names of the tools in the collection
tool_names = [tool.name for tool in my_toolbox]
print(f"Tools in the toolbox: {tool_names}")
# Get the parameters needed for the LLM
tool_params_for_llm = my_toolbox.to_params()
print(f"\nParameters for LLM (showing first tool):")
import json
print(json.dumps(tool_params_for_llm[0], indent=2))
Explanation:
- We import
ToolCollectionand the specific tools (WebSearch,EchoTool). - We create instances of the tools we need.
my_toolbox = ToolCollection(echo_tool, web_search_tool)creates the collection, holding our tool instances.- We can access the tools inside using
my_toolbox.toolsor iterate overmy_toolbox. my_toolbox.to_params()is a crucial method. It formats thename,description, andparametersof all tools in the collection into a list of dictionaries. This specific format is exactly what the agent’s LLM needs (when using itsask_toolmethod) to understand which tools are available and how to use them.
Example Output:
Tools in the toolbox: ['echo_message', 'web_search']
Parameters for LLM (showing first tool):
{
"type": "function",
"function": {
"name": "echo_message",
"description": "Repeats back the text provided in the 'message' parameter.",
"parameters": {
"type": "object",
"properties": {
"message": {
"type": "string",
"description": "The text to be echoed back."
}
},
"required": [
"message"
]
}
}
}
3. Agent Using the ToolCollection:
Now, how does an agent like ToolCallAgent (a specific type of BaseAgent) use this?
Conceptually (the real agent code is more complex):
- The agent is configured with a
ToolCollection(likemy_toolbox). - When the agent needs to figure out the next step, it calls its LLM’s
ask_toolmethod. - It passes the conversation history (Message / Memory) AND the output of
my_toolbox.to_params()to the LLM. - The LLM looks at the conversation and the list of available tools (from
to_params()). It reads thedescriptionof each tool to understand what it does. - If the LLM decides a tool is needed (e.g., the user asked “What’s today’s date?”, the LLM sees the
web_searchtool is available and appropriate), it will generate a special response indicating:- The
nameof the tool to use (e.g.,"web_search"). - The
arguments(inputs) for the tool, based on itsparameters(e.g.,{"query": "today's date"}).
- The
- The agent receives this response from the LLM.
- The agent then uses the
ToolCollection’sexecutemethod:await my_toolbox.execute(name="web_search", tool_input={"query": "today's date"}). - The
ToolCollectionfinds theWebSearchtool instance in its internaltool_mapand calls itsexecutemethod with the provided input. - The
WebSearchtool runs, performs the actual web search, and returns the results (as aToolResultor similar). - The agent takes this result, formats it as a
toolmessage, adds it to its memory, and continues its thinking process (often asking the LLM again, now with the tool’s result as context).
The ToolCollection acts as the crucial bridge between the LLM’s decision to use a tool and the actual execution of that tool’s code.
Under the Hood: How ToolCollection.execute Works
Let’s trace the flow when an agent asks its ToolCollection to run a tool:
sequenceDiagram
participant Agent as ToolCallAgent
participant LLM as LLM (Deciding Step)
participant Toolbox as ToolCollection
participant SpecificTool as e.g., WebSearch Tool
Agent->>+LLM: ask_tool(messages, tools=Toolbox.to_params())
LLM->>LLM: Analyzes messages & available tools
LLM-->>-Agent: Response indicating tool call: name='web_search', arguments={'query': '...'}
Agent->>+Toolbox: execute(name='web_search', tool_input={'query': '...'})
Toolbox->>Toolbox: Look up 'web_search' in internal tool_map
Note right of Toolbox: Finds the WebSearch instance
Toolbox->>+SpecificTool: Calls execute(**tool_input) on the found tool
SpecificTool->>SpecificTool: Performs actual web search action
SpecificTool-->>-Toolbox: Returns ToolResult (output="...", error=None)
Toolbox-->>-Agent: Returns the ToolResult
Agent->>Agent: Processes the result (adds to memory, etc.)
Code Glimpse:
Let’s look at the ToolCollection itself in app/tool/tool_collection.py:
# Simplified snippet from app/tool/tool_collection.py
from typing import Any, Dict, List, Tuple
from app.tool.base import BaseTool, ToolResult, ToolFailure
from app.exceptions import ToolError
class ToolCollection:
# ... (Config class) ...
tools: Tuple[BaseTool, ...] # Holds the tool instances
tool_map: Dict[str, BaseTool] # Maps name to tool instance for quick lookup
def __init__(self, *tools: BaseTool):
"""Initializes with a sequence of tools."""
self.tools = tools
# Create the map for easy lookup by name
self.tool_map = {tool.name: tool for tool in tools}
def to_params(self) -> List[Dict[str, Any]]:
"""Formats tools for the LLM API."""
# Calls the 'to_param()' method on each tool
return [tool.to_param() for tool in self.tools]
async def execute(
self, *, name: str, tool_input: Dict[str, Any] = None
) -> ToolResult:
"""Finds a tool by name and executes it."""
# 1. Find the tool instance using the name
tool = self.tool_map.get(name)
if not tool:
# Return a standard failure result if tool not found
return ToolFailure(error=f"Tool {name} is invalid")
# 2. Execute the tool's specific method
try:
# The 'tool(**tool_input)' calls the tool instance's __call__ method,
# which in BaseTool, calls the tool's 'execute' method.
# The ** unpacks the dictionary into keyword arguments.
result = await tool(**(tool_input or {}))
# Ensure the result is a ToolResult (or subclass)
return result if isinstance(result, ToolResult) else ToolResult(output=str(result))
except ToolError as e:
# Handle errors specific to tools
return ToolFailure(error=e.message)
except Exception as e:
# Handle unexpected errors during execution
return ToolFailure(error=f"Unexpected error executing tool {name}: {e}")
# ... other methods like add_tool, __iter__ ...
Explanation:
- The
__init__method takes tool instances and stores them inself.tools(a tuple) andself.tool_map(a dictionary mapping name to instance). to_paramsiterates throughself.toolsand calls each tool’sto_param()method (defined inBaseTool) to get the LLM-compatible format.executeis the core method used by agents:- It uses
self.tool_map.get(name)to quickly find the correct tool instance based on the requested name. - If found, it calls
await tool(**(tool_input or {})). The**unpacks thetool_inputdictionary into keyword arguments for the tool’sexecutemethod (e.g.,message="hello"for ourEchoTool, orquery="today's date"forWebSearch). - It wraps the execution in
try...exceptblocks to catch errors and return a standardizedToolFailureresult if anything goes wrong.
- It uses
Wrapping Up Chapter 4
We’ve learned how Tools give agents specific skills beyond basic language understanding.
BaseToolis the abstract blueprint defining a tool’sname,description, and expectedparameters.- Concrete tools (like
WebSearch,Bash, or our customEchoTool) inherit fromBaseTooland implement the actualexecutelogic. ToolCollectionacts as the agent’s toolbox, holding various tools and providing methods (to_params,execute) for the agent (often guided by its LLM) to discover and use these capabilities.
With tools, agents can interact with external systems, run code, access real-time data, and perform complex actions, making them much more powerful.
But how do we coordinate multiple agents, potentially using different tools, to work together on a larger task? That’s where Flows come in.
Let’s move on to Chapter 5: BaseFlow to see how we orchestrate complex workflows involving multiple agents and steps.
Generated by AI Codebase Knowledge Builder