Chapter 4: Tool / ToolCollection - Giving Your Agent Skills

In Chapter 3: BaseAgent - The Agent Blueprint, we learned how BaseAgent provides the standard structure for our agents, including a brain (LLM) and memory (Message / Memory). But what if we want our agent to do more than just think and remember? What if we want it to act in the world – like searching the web, running code, or editing files?

This is where Tools come in!

What Problem Do They Solve?

Imagine an agent trying to answer the question: “What’s the weather like in Tokyo right now?”

The agent’s LLM brain has a lot of general knowledge, but it doesn’t have real-time access to the internet. It can’t check the current weather. It needs a specific capability or skill to do that.

Similarly, if you ask an agent to “Write a python script that prints ‘hello world’ and save it to a file named hello.py,” the agent needs the ability to:

  1. Understand the request (using its LLM).
  2. Write the code (using its LLM).
  3. Actually execute code to create and write to a file.

Steps 1 and 2 are handled by the LLM, but step 3 requires interacting with the computer’s file system – something the LLM can’t do directly.

Tools give agents these specific, actionable skills. A ToolCollection organizes these skills so the agent knows what it can do.

Use Case: Let’s build towards an agent that can:

  1. Search the web for today’s date.
  2. Tell the user the date.

This agent needs a “Web Search” tool.

Key Concepts: Tools and Toolboxes

Let’s break down the two main ideas:

1. BaseTool: The Blueprint for a Skill

Think of BaseTool (app/tool/base.py) as the template or design specification for any tool. It doesn’t do anything itself, but it defines what every tool needs to have:

  • name (str): A short, descriptive name for the tool (e.g., web_search, file_writer, code_runner). This is how the agent (or LLM) identifies the tool.
  • description (str): A clear explanation of what the tool does, what it’s good for, and when to use it. This is crucial for the LLM to decide which tool to use for a given task.
  • parameters (dict): A definition of the inputs the tool expects. For example, a web_search tool needs a query input, and a file_writer needs a path and content. This is defined using a standard format called JSON Schema.
  • execute method: An abstract method. This means BaseTool says “every tool must have an execute method”, but each specific tool needs to provide its own instructions for how to actually perform the action.

You almost never use BaseTool directly. You use it as a starting point to create actual, usable tools.

2. Concrete Tools: The Actual Skills

These are specific classes that inherit from BaseTool and provide the real implementation for the execute method. OpenManus comes with several pre-built tools:

  • WebSearch (app/tool/web_search.py): Searches the web using engines like Google, Bing, etc.
  • Bash (app/tool/bash.py): Executes shell commands (like ls, pwd, python script.py).
  • StrReplaceEditor (app/tool/str_replace_editor.py): Views, creates, and edits files by replacing text.
  • BrowserUseTool (app/tool/browser_use_tool.py): Interacts with web pages like a user (clicking, filling forms, etc.).
  • Terminate (app/tool/terminate.py): A special tool used by agents to signal they have finished their task.

Each of these defines its specific name, description, parameters, and implements the execute method to perform its unique action.

3. ToolCollection: The Agent’s Toolbox

Think of a handyman. They don’t just carry one tool; they have a toolbox filled with hammers, screwdrivers, wrenches, etc.

A ToolCollection (app/tool/tool_collection.py) is like that toolbox for an agent.

  • It holds a list of specific tool instances (like WebSearch, Bash).
  • It allows the agent (and its LLM) to see all the available tools and their descriptions.
  • It provides a way to execute a specific tool by its name.

When an agent needs to perform an action, its LLM can look at the ToolCollection, read the descriptions of the available tools, choose the best one for the job, figure out the necessary inputs based on the tool’s parameters, and then ask the ToolCollection to execute that tool with those inputs.

How Do We Use Them?

Let’s see how we can equip an agent with a simple tool. We’ll create a basic “EchoTool” first.

1. Creating a Concrete Tool (Inheriting from BaseTool):

# Import the necessary base class
from app.tool.base import BaseTool, ToolResult

# Define our simple tool
class EchoTool(BaseTool):
    """A simple tool that echoes the input text."""

    name: str = "echo_message"
    description: str = "Repeats back the text provided in the 'message' parameter."
    parameters: dict = {
        "type": "object",
        "properties": {
            "message": {
                "type": "string",
                "description": "The text to be echoed back.",
            },
        },
        "required": ["message"], # Tells the LLM 'message' must be provided
    }

    # Implement the actual action
    async def execute(self, message: str) -> ToolResult:
        """Takes a message and returns it."""
        print(f"EchoTool executing with message: '{message}'")
        # ToolResult is a standard way to return tool output
        return ToolResult(output=f"You said: {message}")

# Create an instance of our tool
echo_tool_instance = EchoTool()

print(f"Tool Name: {echo_tool_instance.name}")
print(f"Tool Description: {echo_tool_instance.description}")

Explanation:

  • We import BaseTool and ToolResult (a standard object for wrapping tool outputs).
  • class EchoTool(BaseTool): declares that our EchoTool is a type of BaseTool.
  • We define the name, description, and parameters according to the BaseTool template. The parameters structure tells the LLM what input is expected (message as a string) and that it’s required.
  • We implement async def execute(self, message: str) -> ToolResult:. This is the specific logic for our tool. It takes the message input and returns it wrapped in a ToolResult.

Example Output:

Tool Name: echo_message
Tool Description: Repeats back the text provided in the 'message' parameter.

2. Creating a ToolCollection:

Now, let’s put our EchoTool and the built-in WebSearch tool into a toolbox.

# Import ToolCollection and the tools we want
from app.tool import ToolCollection, WebSearch
# Assume EchoTool class is defined as above
# from your_module import EchoTool # Or wherever EchoTool is defined

# Create instances of the tools
echo_tool = EchoTool()
web_search_tool = WebSearch() # Uses default settings

# Create a ToolCollection containing these tools
my_toolbox = ToolCollection(echo_tool, web_search_tool)

# See the names of the tools in the collection
tool_names = [tool.name for tool in my_toolbox]
print(f"Tools in the toolbox: {tool_names}")

# Get the parameters needed for the LLM
tool_params_for_llm = my_toolbox.to_params()
print(f"\nParameters for LLM (showing first tool):")
import json
print(json.dumps(tool_params_for_llm[0], indent=2))

Explanation:

  • We import ToolCollection and the specific tools (WebSearch, EchoTool).
  • We create instances of the tools we need.
  • my_toolbox = ToolCollection(echo_tool, web_search_tool) creates the collection, holding our tool instances.
  • We can access the tools inside using my_toolbox.tools or iterate over my_toolbox.
  • my_toolbox.to_params() is a crucial method. It formats the name, description, and parameters of all tools in the collection into a list of dictionaries. This specific format is exactly what the agent’s LLM needs (when using its ask_tool method) to understand which tools are available and how to use them.

Example Output:

Tools in the toolbox: ['echo_message', 'web_search']

Parameters for LLM (showing first tool):
{
  "type": "function",
  "function": {
    "name": "echo_message",
    "description": "Repeats back the text provided in the 'message' parameter.",
    "parameters": {
      "type": "object",
      "properties": {
        "message": {
          "type": "string",
          "description": "The text to be echoed back."
        }
      },
      "required": [
        "message"
      ]
    }
  }
}

3. Agent Using the ToolCollection:

Now, how does an agent like ToolCallAgent (a specific type of BaseAgent) use this?

Conceptually (the real agent code is more complex):

  1. The agent is configured with a ToolCollection (like my_toolbox).
  2. When the agent needs to figure out the next step, it calls its LLM’s ask_tool method.
  3. It passes the conversation history (Message / Memory) AND the output of my_toolbox.to_params() to the LLM.
  4. The LLM looks at the conversation and the list of available tools (from to_params()). It reads the description of each tool to understand what it does.
  5. If the LLM decides a tool is needed (e.g., the user asked “What’s today’s date?”, the LLM sees the web_search tool is available and appropriate), it will generate a special response indicating:
    • The name of the tool to use (e.g., "web_search").
    • The arguments (inputs) for the tool, based on its parameters (e.g., {"query": "today's date"}).
  6. The agent receives this response from the LLM.
  7. The agent then uses the ToolCollection’s execute method: await my_toolbox.execute(name="web_search", tool_input={"query": "today's date"}).
  8. The ToolCollection finds the WebSearch tool instance in its internal tool_map and calls its execute method with the provided input.
  9. The WebSearch tool runs, performs the actual web search, and returns the results (as a ToolResult or similar).
  10. The agent takes this result, formats it as a tool message, adds it to its memory, and continues its thinking process (often asking the LLM again, now with the tool’s result as context).

The ToolCollection acts as the crucial bridge between the LLM’s decision to use a tool and the actual execution of that tool’s code.

Under the Hood: How ToolCollection.execute Works

Let’s trace the flow when an agent asks its ToolCollection to run a tool:

sequenceDiagram
    participant Agent as ToolCallAgent
    participant LLM as LLM (Deciding Step)
    participant Toolbox as ToolCollection
    participant SpecificTool as e.g., WebSearch Tool

    Agent->>+LLM: ask_tool(messages, tools=Toolbox.to_params())
    LLM->>LLM: Analyzes messages & available tools
    LLM-->>-Agent: Response indicating tool call: name='web_search', arguments={'query': '...'}
    Agent->>+Toolbox: execute(name='web_search', tool_input={'query': '...'})
    Toolbox->>Toolbox: Look up 'web_search' in internal tool_map
    Note right of Toolbox: Finds the WebSearch instance
    Toolbox->>+SpecificTool: Calls execute(**tool_input) on the found tool
    SpecificTool->>SpecificTool: Performs actual web search action
    SpecificTool-->>-Toolbox: Returns ToolResult (output="...", error=None)
    Toolbox-->>-Agent: Returns the ToolResult
    Agent->>Agent: Processes the result (adds to memory, etc.)

Code Glimpse:

Let’s look at the ToolCollection itself in app/tool/tool_collection.py:

# Simplified snippet from app/tool/tool_collection.py
from typing import Any, Dict, List, Tuple
from app.tool.base import BaseTool, ToolResult, ToolFailure
from app.exceptions import ToolError

class ToolCollection:
    # ... (Config class) ...

    tools: Tuple[BaseTool, ...] # Holds the tool instances
    tool_map: Dict[str, BaseTool] # Maps name to tool instance for quick lookup

    def __init__(self, *tools: BaseTool):
        """Initializes with a sequence of tools."""
        self.tools = tools
        # Create the map for easy lookup by name
        self.tool_map = {tool.name: tool for tool in tools}

    def to_params(self) -> List[Dict[str, Any]]:
        """Formats tools for the LLM API."""
        # Calls the 'to_param()' method on each tool
        return [tool.to_param() for tool in self.tools]

    async def execute(
        self, *, name: str, tool_input: Dict[str, Any] = None
    ) -> ToolResult:
        """Finds a tool by name and executes it."""
        # 1. Find the tool instance using the name
        tool = self.tool_map.get(name)
        if not tool:
            # Return a standard failure result if tool not found
            return ToolFailure(error=f"Tool {name} is invalid")

        # 2. Execute the tool's specific method
        try:
            # The 'tool(**tool_input)' calls the tool instance's __call__ method,
            # which in BaseTool, calls the tool's 'execute' method.
            # The ** unpacks the dictionary into keyword arguments.
            result = await tool(**(tool_input or {}))
            # Ensure the result is a ToolResult (or subclass)
            return result if isinstance(result, ToolResult) else ToolResult(output=str(result))
        except ToolError as e:
             # Handle errors specific to tools
            return ToolFailure(error=e.message)
        except Exception as e:
             # Handle unexpected errors during execution
            return ToolFailure(error=f"Unexpected error executing tool {name}: {e}")

    # ... other methods like add_tool, __iter__ ...

Explanation:

  • The __init__ method takes tool instances and stores them in self.tools (a tuple) and self.tool_map (a dictionary mapping name to instance).
  • to_params iterates through self.tools and calls each tool’s to_param() method (defined in BaseTool) to get the LLM-compatible format.
  • execute is the core method used by agents:
    • It uses self.tool_map.get(name) to quickly find the correct tool instance based on the requested name.
    • If found, it calls await tool(**(tool_input or {})). The ** unpacks the tool_input dictionary into keyword arguments for the tool’s execute method (e.g., message="hello" for our EchoTool, or query="today's date" for WebSearch).
    • It wraps the execution in try...except blocks to catch errors and return a standardized ToolFailure result if anything goes wrong.

Wrapping Up Chapter 4

We’ve learned how Tools give agents specific skills beyond basic language understanding.

  • BaseTool is the abstract blueprint defining a tool’s name, description, and expected parameters.
  • Concrete tools (like WebSearch, Bash, or our custom EchoTool) inherit from BaseTool and implement the actual execute logic.
  • ToolCollection acts as the agent’s toolbox, holding various tools and providing methods (to_params, execute) for the agent (often guided by its LLM) to discover and use these capabilities.

With tools, agents can interact with external systems, run code, access real-time data, and perform complex actions, making them much more powerful.

But how do we coordinate multiple agents, potentially using different tools, to work together on a larger task? That’s where Flows come in.

Let’s move on to Chapter 5: BaseFlow to see how we orchestrate complex workflows involving multiple agents and steps.


Generated by AI Codebase Knowledge Builder