Chapter 9: MCP (Model Context Protocol)
Welcome to the final chapter of our core concepts tutorial! In Chapter 8: DockerSandbox, we saw how OpenManus can safely run code in an isolated environment. Now, let’s explore a powerful way to extend your agent’s capabilities without changing its internal code: the Model Context Protocol (MCP).
What Problem Does MCP Solve?
Imagine you have an agent running smoothly. Suddenly, you realize you need it to perform a new, specialized task – maybe interacting with a custom company database or using a complex scientific calculation library.
Normally, you might have to:
- Stop the agent.
- Write new code for the Tool that performs this task.
- Add this tool to the agent’s code or configuration.
- Restart the agent.
This process can be cumbersome, especially if you want to add or update tools frequently, or if different people are managing different tools.
What if there was a way for the agent to dynamically discover and use tools provided by a completely separate service? Like plugging in a new USB device, and your computer automatically recognizes and uses it?
This is what MCP enables! It defines a standard way for an OpenManus agent (MCPAgent
) to connect to an external MCP Server. This server advertises the tools it offers, and the agent can call these tools remotely as if they were built-in.
Use Case: Let’s say we want our agent to be able to run basic shell commands (like ls
or pwd
) using the Bash
tool. Instead of building the Bash
tool directly into the agent, we can run an MCPServer
that offers the Bash
tool. Our MCPAgent
can connect to this server, discover the Bash
tool, and use it when needed, all without having the Bash
tool’s code inside the agent itself. If we later update the Bash
tool on the server, the agent automatically gets the new version without needing changes.
Key Concepts: The Agent, The Server, and The Rules
MCP involves a few key players working together:
MCPServer
(The Tool Provider):- Think of this as a separate application, like a dedicated “Tool Shop” running independently from your agent.
- It holds one or more Tools (like
Bash
,BrowserUseTool
,StrReplaceEditor
, or custom ones). - It “advertises” these tools, meaning it can tell connected clients (agents) which tools are available, what they do, and how to use them.
- When asked, it executes a tool and sends the result back.
- In OpenManus,
app/mcp/server.py
provides an implementation of this server.
MCPAgent
(The Tool User):- This is a specialized type of BaseAgent designed specifically to talk to an
MCPServer
. - When it starts, it connects to the specified
MCPServer
. - It asks the server: “What tools do you have?”
- It treats the server’s tools as its own available
ToolCollection
. - When its LLM decides to use one of these tools, the
MCPAgent
sends a request to theMCPServer
to execute it. - It can even periodically check if the server has added or removed tools and update its capabilities accordingly!
- This is a specialized type of BaseAgent designed specifically to talk to an
- The Protocol (The Rules of Communication):
- MCP defines the exact format of messages exchanged between the
MCPAgent
andMCPServer
. How does the agent ask for the tool list? How does it request a tool execution? How is the result formatted? - OpenManus supports two main ways (transports) for this communication:
- stdio (Standard Input/Output): The agent starts the server process directly and communicates with it using standard text streams (like typing commands in a terminal). This is simpler for local setups.
- SSE (Server-Sent Events): The agent connects to a running server over the network (using HTTP). This is more suitable if the server is running elsewhere.
- MCP defines the exact format of messages exchanged between the
Analogy: Imagine the MCPServer
is a smart TV’s App Store, offering apps (tools) like Netflix or YouTube. The MCPAgent
is a universal remote control. MCP is the protocol that lets the remote connect to the TV, see the available apps, and tell the TV “Launch Netflix” or “Play this video on YouTube”. The actual app logic runs on the TV (the server), not the remote (the agent).
How Do We Use It?
Let’s see how to run the server and connect an agent using the simple stdio
method.
1. Run the MCPServer:
The server needs to be running first. OpenManus provides a script to run a server that includes standard tools like Bash
, Browser
, and Editor
.
Open a terminal and run:
# Make sure you are in the root directory of the OpenManus project
# Use python to run the server module
python -m app.mcp.server --transport stdio
Expected Output (in the server terminal):
INFO:root:Registered tool: bash
INFO:root:Registered tool: browser
INFO:root:Registered tool: editor
INFO:root:Registered tool: terminate
INFO:root:Starting OpenManus server (stdio mode)
# --- The server is now running and waiting for a connection ---
Explanation:
python -m app.mcp.server
tells Python to run the server code located inapp/mcp/server.py
.--transport stdio
specifies that it should listen for connections via standard input/output.- It registers the built-in tools and waits.
2. Run the MCPAgent (connecting to the server):
Now, open a separate terminal. We’ll run a script that starts the MCPAgent
and tells it how to connect to the server we just started.
# In a NEW terminal, in the root directory of OpenManus
# Run the MCP agent runner script
python run_mcp.py --connection stdio --interactive
Expected Output (in the agent terminal):
INFO:app.config:Configuration loaded successfully from .../config/config.toml
INFO:app.agent.mcp:Initializing MCPAgent with stdio connection...
# ... (potential logs about connecting) ...
INFO:app.tool.mcp:Connected to server with tools: ['bash', 'browser', 'editor', 'terminate']
INFO:app.agent.mcp:Connected to MCP server via stdio
MCP Agent Interactive Mode (type 'exit' to quit)
Enter your request:
Explanation:
python run_mcp.py
runs the agent launcher script.--connection stdio
tells the agent to connect using standard input/output. The script (run_mcp.py
) knows how to start the server process (python -m app.mcp.server
) for this mode.--interactive
puts the agent in a mode where you can chat with it.- The agent connects, asks the server for its tools (
list_tools
), and logs the tools it found (bash
,browser
, etc.). It’s now ready for your requests!
3. Interact with the Agent (Using a Server Tool):
Now, in the agent’s interactive prompt, ask it to do something that requires a tool provided by the server, like listing files using bash
:
# In the agent's terminal
Enter your request: Use the bash tool to list the files in the current directory.
What Happens:
- The
MCPAgent
receives your request. - Its LLM analyzes the request and decides the
bash
tool is needed, with the commandls
. - The agent sees that
bash
is a tool provided by the connectedMCPServer
. - The agent sends a
call_tool
request overstdio
to the server: “Please runbash
withcommand='ls'
”. - The
MCPServer
receives the request, finds itsBash
tool, and executesls
. - The server captures the output (the list of files).
- The server sends the result back to the agent.
- The agent receives the result, adds it to its Memory, and might use its LLM again to formulate a user-friendly response based on the tool’s output.
Expected Output (in the agent terminal, may vary):
# ... (Potential LLM thinking logs) ...
INFO:app.agent.mcp:Executing tool: bash with input {'command': 'ls'}
# ... (Server logs might show execution in its own terminal) ...
Agent: The bash tool executed the 'ls' command and returned the following output:
[List of files/directories in the project root, e.g.,]
README.md
app
config
run_mcp.py
... etc ...
Success! The agent used a tool (bash
) that wasn’t part of its own code, but was provided dynamically by the external MCPServer
via the Model Context Protocol. If you added a new tool to the MCPServer
code and restarted the server, the agent could potentially discover and use it without needing any changes itself (it periodically refreshes the tool list).
Type exit
in the agent’s terminal to stop it, then stop the server (usually Ctrl+C in its terminal).
Under the Hood: How MCP Communication Flows
Let’s trace the simplified steps when the agent uses a server tool:
- Connect & List: Agent starts, connects to Server (
stdio
orSSE
). Agent sendslist_tools
request. Server replies with list of tools (name
,description
,parameters
). Agent stores these. - User Request: User asks agent to do something (e.g., “list files”).
- LLM Decides: Agent’s LLM decides to use
bash
tool withcommand='ls'
. - Agent Request: Agent finds
bash
in its list of server tools. Sendscall_tool
request to Server (containing tool namebash
and arguments{'command': 'ls'}
). - Server Executes: Server receives request. Finds its internal
Bash
tool. Calls the tool’sexecute(command='ls')
method. The tool runsls
. - Server Response: Server gets the result from the tool (e.g., “README.md\napp\n…”). Sends this result back to the Agent.
- Agent Processes: Agent receives the result. Updates its memory. Presents the answer to the user.
Sequence Diagram:
sequenceDiagram
participant User
participant Agent as MCPAgent
participant LLM as Agent's LLM
participant Server as MCPServer
participant BashTool as Bash Tool (on Server)
Note over Agent, Server: Initial Connection & list_tools (omitted for brevity)
User->>+Agent: "List files using bash"
Agent->>+LLM: ask_tool("List files", tools=[...bash_schema...])
LLM-->>-Agent: Decide: call tool 'bash', args={'command':'ls'}
Agent->>+Server: call_tool(name='bash', args={'command':'ls'})
Server->>+BashTool: execute(command='ls')
BashTool->>BashTool: Runs 'ls' command
BashTool-->>-Server: Returns file list string
Server-->>-Agent: Tool Result (output=file list)
Agent->>Agent: Process result, update memory
Agent-->>-User: "OK, the files are: ..."
Code Glimpse: Key MCP Components
Let’s look at simplified parts of the relevant files.
1. MCPServer
(app/mcp/server.py
): Registering Tools The server uses the fastmcp
library to handle the protocol details. It needs to register the tools it wants to offer.
# Simplified snippet from app/mcp/server.py
from mcp.server.fastmcp import FastMCP
from app.tool.base import BaseTool
from app.tool.bash import Bash # Import the tool to offer
from app.logger import logger
import json
class MCPServer:
def __init__(self, name: str = "openmanus"):
self.server = FastMCP(name) # The underlying MCP server library
self.tools: Dict[str, BaseTool] = {}
# Add tools to offer
self.tools["bash"] = Bash()
# ... add other tools like Browser, Editor ...
def register_tool(self, tool: BaseTool) -> None:
"""Registers a tool's execute method with the FastMCP server."""
tool_name = tool.name
tool_param = tool.to_param() # Get schema for the LLM
tool_function = tool_param["function"]
# Define the function that the MCP server will expose
async def tool_method(**kwargs):
logger.info(f"Executing {tool_name} via MCP: {kwargs}")
# Call the actual tool's execute method
result = await tool.execute(**kwargs)
logger.info(f"Result of {tool_name}: {result}")
# Return result (often needs conversion, e.g., to JSON)
return json.dumps(result.model_dump()) if hasattr(result, "model_dump") else str(result)
# Attach metadata (name, description, parameters) for discovery
tool_method.__name__ = tool_name
tool_method.__doc__ = self._build_docstring(tool_function)
tool_method.__signature__ = self._build_signature(tool_function)
# Register with the FastMCP library instance
self.server.tool()(tool_method)
logger.info(f"Registered tool for MCP: {tool_name}")
def register_all_tools(self):
for tool in self.tools.values():
self.register_tool(tool)
def run(self, transport: str = "stdio"):
self.register_all_tools()
logger.info(f"Starting MCP server ({transport} mode)")
self.server.run(transport=transport) # Start listening
# Command-line execution part:
# if __name__ == "__main__":
# server = MCPServer()
# server.run(transport="stdio") # Or based on args
Explanation: The MCPServer
creates instances of tools (Bash
, etc.) and then uses register_tool
to wrap each tool’s execute
method into a format the fastmcp
library understands. This allows the server to advertise the tool (with its name, description, parameters) and call the correct function when the agent makes a call_tool
request.
2. MCPClients
(app/tool/mcp.py
): Client-Side Tool Representation The MCPAgent
uses this class, which acts like a ToolCollection
, but its tools are proxies that make calls to the remote server.
# Simplified snippet from app/tool/mcp.py
from mcp import ClientSession # MCP library for client-side communication
from mcp.client.stdio import stdio_client # Specific transport handler
from mcp.types import TextContent
from app.tool.base import BaseTool, ToolResult
from app.tool.tool_collection import ToolCollection
from contextlib import AsyncExitStack
# Represents a single tool on the server, callable from the client
class MCPClientTool(BaseTool):
session: Optional[ClientSession] = None # Holds the connection
async def execute(self, **kwargs) -> ToolResult:
"""Execute by calling the remote tool via the MCP session."""
if not self.session: return ToolResult(error="Not connected")
try:
# Make the actual remote call
result = await self.session.call_tool(self.name, kwargs)
# Extract text output from the response
content = ", ".join(
item.text for item in result.content if isinstance(item, TextContent)
)
return ToolResult(output=content or "No output.")
except Exception as e:
return ToolResult(error=f"MCP tool error: {e}")
# The collection holding the proxy tools
class MCPClients(ToolCollection):
session: Optional[ClientSession] = None
exit_stack: AsyncExitStack = None # Manages connection resources
async def connect_stdio(self, command: str, args: List[str]):
"""Connect using stdio."""
if self.session: await self.disconnect()
self.exit_stack = AsyncExitStack()
# Set up stdio connection using MCP library helper
server_params = {"command": command, "args": args} # Simplified
streams = await self.exit_stack.enter_async_context(
stdio_client(server_params)
)
# Establish the MCP session over the connection
self.session = await self.exit_stack.enter_async_context(
ClientSession(*streams)
)
await self._initialize_and_list_tools() # Get tool list from server
async def _initialize_and_list_tools(self):
"""Fetch tools from server and create proxy objects."""
await self.session.initialize()
response = await self.session.list_tools() # Ask server for tools
self.tool_map = {}
for tool_info in response.tools:
# Create an MCPClientTool instance for each server tool
proxy_tool = MCPClientTool(
name=tool_info.name,
description=tool_info.description,
parameters=tool_info.inputSchema, # Use schema from server
session=self.session, # Pass the active session
)
self.tool_map[tool_info.name] = proxy_tool
self.tools = tuple(self.tool_map.values())
logger.info(f"MCP Client found tools: {list(self.tool_map.keys())}")
async def disconnect(self):
if self.session and self.exit_stack:
await self.exit_stack.aclose() # Clean up connection
# ... reset state ...
Explanation: MCPClients
handles the connection (connect_stdio
). When connected, it calls list_tools
on the server. For each tool reported by the server, it creates a local MCPClientTool
proxy object. This proxy object looks like a normal BaseTool
(with name, description, parameters), but its execute
method doesn’t run code locally – instead, it uses the active ClientSession
to send a call_tool
request back to the server.
3. MCPAgent
(app/agent/mcp.py
): Using MCPClients The agent integrates the MCPClients
collection.
# Simplified snippet from app/agent/mcp.py
from app.agent.toolcall import ToolCallAgent
from app.tool.mcp import MCPClients
class MCPAgent(ToolCallAgent):
# Use MCPClients as the tool collection
mcp_clients: MCPClients = Field(default_factory=MCPClients)
available_tools: MCPClients = None # Will point to mcp_clients
connection_type: str = "stdio"
# ... other fields ...
async def initialize(
self, command: Optional[str] = None, args: Optional[List[str]] = None, ...
):
"""Initialize by connecting the MCPClients instance."""
if self.connection_type == "stdio":
# Tell mcp_clients to connect
await self.mcp_clients.connect_stdio(command=command, args=args or [])
# elif self.connection_type == "sse": ...
# The agent's tools are now the tools provided by the server
self.available_tools = self.mcp_clients
# Store initial tool schemas for detecting changes later
self.tool_schemas = {t.name: t.parameters for t in self.available_tools}
# Add system message about tools...
async def _refresh_tools(self):
"""Periodically check the server for tool updates."""
if not self.mcp_clients.session: return
# Ask the server for its current list of tools
response = await self.mcp_clients.session.list_tools()
current_tools = {t.name: t.inputSchema for t in response.tools}
# Compare with stored schemas (self.tool_schemas)
# Detect added/removed tools and update self.tool_schemas
# Add system messages to memory if tools change
# ... logic to detect and log changes ...
async def think(self) -> bool:
"""Agent's thinking step."""
# Refresh tools periodically
if self.current_step % self._refresh_tools_interval == 0:
await self._refresh_tools()
# Stop if server seems gone (no tools left)
if not self.mcp_clients.tool_map: return False
# Use parent class's think method, which uses self.available_tools
# (which points to self.mcp_clients) for tool decisions/calls
return await super().think()
async def cleanup(self):
"""Disconnect the MCP session when the agent finishes."""
if self.mcp_clients.session:
await self.mcp_clients.disconnect()
Explanation: The MCPAgent
holds an instance of MCPClients
. In initialize
, it tells MCPClients
to connect to the server. It sets its own available_tools
to point to the MCPClients
instance. When the agent’s think
method (inherited from ToolCallAgent
) needs to consider or execute tools, it uses self.available_tools
. Because this is the MCPClients
object, any tool execution results in a remote call to the MCPServer
via the proxy tools. The agent also adds logic to periodically _refresh_tools
and cleanup
the connection.
Wrapping Up Chapter 9
Congratulations on completing the core concepts tutorial!
In this final chapter, we explored the Model Context Protocol (MCP). You learned how MCP allows an MCPAgent
to connect to an external MCPServer
and dynamically discover and use tools hosted by that server. This provides a powerful way to extend agent capabilities with specialized tools without modifying the agent’s core code, enabling a flexible, plug-and-play architecture for agent skills.
You’ve journeyed through the essential building blocks of OpenManus:
- The “brain” (LLM)
- Conversation history (Message / Memory)
- The agent structure (BaseAgent)
- Agent skills (Tool / ToolCollection)
- Multi-step task orchestration (BaseFlow)
- Data structure definitions (Schema)
- Settings management (Configuration (Config))
- Secure code execution (DockerSandbox)
- And dynamic external tools (MCP)
Armed with this knowledge, you’re now well-equipped to start exploring the OpenManus codebase, experimenting with different agents and tools, and building your own intelligent applications! Good luck!
Generated by AI Codebase Knowledge Builder