Chapter 5: ChatCompletionClient - Talking to the Brains
So far, we’ve learned about:
- Agents: The workers in our system.
- Messaging: How agents communicate broadly.
- AgentRuntime: The manager that runs the show.
- Tools: How agents get specific skills.
But how does an agent actually think or generate text? Many powerful agents rely on Large Language Models (LLMs) – think of models like GPT-4, Claude, or Gemini – as their “brains”. How does an agent in AutoGen Core communicate with these external LLM services?
This is where the ChatCompletionClient
comes in. It’s the dedicated component for talking to LLMs.
Motivation: Bridging the Gap to LLMs
Imagine you want to build an agent that can summarize long articles.
- You give the agent an article (as a message).
- The agent needs to send this article to an LLM (like GPT-4).
- It also needs to tell the LLM: “Please summarize this.”
- The LLM processes the request and generates a summary.
- The agent needs to receive this summary back from the LLM.
How does the agent handle the technical details of connecting to the LLM’s specific API, formatting the request correctly, sending it over the internet, and understanding the response?
The ChatCompletionClient
solves this! Think of it as the standard phone line and translator connecting your agent to the LLM service. You tell the client what to say (the conversation history and instructions), and it handles how to say it to the specific LLM and translates the LLM’s reply back into a standard format.
Key Concepts: Understanding the LLM Communicator
Let’s break down the ChatCompletionClient
:
-
LLM Communication Bridge: It’s the primary way AutoGen agents interact with external LLM APIs (like OpenAI, Anthropic, Google Gemini, etc.). It hides the complexity of specific API calls.
- Standard Interface (
create
method): It defines a common way to send requests and receive responses, regardless of the underlying LLM. The core method iscreate
. You give it:messages
: A list of messages representing the conversation history so far.- Optional
tools
: A list of tools (Chapter 4) the LLM might be able to use. - Other parameters (like
json_output
hints,cancellation_token
).
- Messages (
LLMMessage
): The conversation history is passed as a sequence of specific message types defined inautogen_core.models
:SystemMessage
: Instructions for the LLM (e.g., “You are a helpful assistant.”).UserMessage
: Input from the user or another agent (e.g., the article text).AssistantMessage
: Previous responses from the LLM (can include text or requests to call functions/tools).FunctionExecutionResultMessage
: The results of executing a tool/function call.
-
Tools (
ToolSchema
): You can provide the schemas of available tools (Chapter 4). The LLM might then respond not with text, but with a request to call one of these tools (FunctionCall
inside anAssistantMessage
). - Response (
CreateResult
): Thecreate
method returns a standardCreateResult
object containing:content
: The LLM’s generated text or a list ofFunctionCall
requests.finish_reason
: Why the LLM stopped generating (e.g., “stop”, “length”, “function_calls”).usage
: How many input (prompt_tokens
) and output (completion_tokens
) tokens were used.cached
: Whether the response came from a cache.
- Token Tracking: The client automatically tracks token usage (
prompt_tokens
,completion_tokens
) for each call. You can query the total usage via methods liketotal_usage()
. This is vital for monitoring costs, as most LLM APIs charge based on tokens.
Use Case Example: Summarizing Text with an LLM
Let’s build a simplified scenario where we use a ChatCompletionClient
to ask an LLM to summarize text.
Goal: Send text to an LLM via a client and get a summary back.
Step 1: Prepare the Input Messages
We need to structure our request as a list of LLMMessage
objects.
# File: prepare_messages.py
from autogen_core.models import SystemMessage, UserMessage
# Instructions for the LLM
system_prompt = SystemMessage(
content="You are a helpful assistant designed to summarize text concisely."
)
# The text we want to summarize
article_text = """
AutoGen is a framework that enables the development of LLM applications using multiple agents
that can converse with each other to solve tasks. AutoGen agents are customizable,
conversable, and can seamlessly allow human participation. They can operate in various modes
that employ combinations of LLMs, human inputs, and tools.
"""
user_request = UserMessage(
content=f"Please summarize the following text in one sentence:\n\n{article_text}",
source="User" # Indicate who provided this input
)
# Combine into a list for the client
messages_to_send = [system_prompt, user_request]
print("Messages prepared:")
for msg in messages_to_send:
print(f"- {msg.type}: {msg.content[:50]}...") # Print first 50 chars
This code defines the instructions (SystemMessage
) and the user’s request (UserMessage
) and puts them in a list, ready to be sent.
Step 2: Use the ChatCompletionClient (Conceptual)
Now, we need an instance of a ChatCompletionClient
. In a real application, you’d configure a specific client (like OpenAIChatCompletionClient
with your API key). For this example, let’s imagine we have a pre-configured client called llm_client
.
# File: call_llm_client.py
import asyncio
from autogen_core.models import CreateResult, RequestUsage
# Assume 'messages_to_send' is from the previous step
# Assume 'llm_client' is a pre-configured ChatCompletionClient instance
# (e.g., llm_client = OpenAIChatCompletionClient(config=...))
async def get_summary(client, messages):
print("\nSending messages to LLM via ChatCompletionClient...")
try:
# The core call: send messages, get structured result
response: CreateResult = await client.create(
messages=messages,
# We aren't providing tools in this simple example
tools=[]
)
print("Received response:")
print(f"- Finish Reason: {response.finish_reason}")
print(f"- Content: {response.content}") # This should be the summary
print(f"- Usage (Tokens): Prompt={response.usage.prompt_tokens}, Completion={response.usage.completion_tokens}")
print(f"- Cached: {response.cached}")
# Also, check total usage tracked by the client
total_usage = client.total_usage()
print(f"\nClient Total Usage: Prompt={total_usage.prompt_tokens}, Completion={total_usage.completion_tokens}")
except Exception as e:
print(f"An error occurred: {e}")
# --- Placeholder for actual client ---
class MockChatCompletionClient: # Simulate a real client
_total_usage = RequestUsage(prompt_tokens=0, completion_tokens=0)
async def create(self, messages, tools=[], **kwargs) -> CreateResult:
# Simulate API call and response
prompt_len = sum(len(str(m.content)) for m in messages) // 4 # Rough token estimate
summary = "AutoGen is a multi-agent framework for developing LLM applications."
completion_len = len(summary) // 4 # Rough token estimate
usage = RequestUsage(prompt_tokens=prompt_len, completion_tokens=completion_len)
self._total_usage.prompt_tokens += usage.prompt_tokens
self._total_usage.completion_tokens += usage.completion_tokens
return CreateResult(
finish_reason="stop", content=summary, usage=usage, cached=False
)
def total_usage(self) -> RequestUsage: return self._total_usage
# Other required methods (count_tokens, model_info etc.) omitted for brevity
async def main():
from prepare_messages import messages_to_send # Get messages from previous step
mock_client = MockChatCompletionClient()
await get_summary(mock_client, messages_to_send)
# asyncio.run(main()) # If you run this, it uses the mock client
This code shows the essential client.create(...)
call. We pass our messages_to_send
and receive a CreateResult
. We then print the summary (response.content
) and the token usage reported for that specific call (response.usage
) and the total tracked by the client (client.total_usage()
).
How an Agent Uses It: Typically, an agent’s logic (e.g., inside its on_message
handler) would:
- Receive an incoming message (like the article to summarize).
- Prepare the list of
LLMMessage
objects (including system prompts, history, and the new request). - Access a
ChatCompletionClient
instance (often provided during agent setup or accessed via its context). - Call
await client.create(...)
. - Process the
CreateResult
(e.g., extract the summary text, check for function calls if tools were provided). - Potentially send the result as a new message to another agent or return it.
Under the Hood: How the Client Talks to the LLM
What happens when you call await client.create(...)
?
Conceptual Flow:
sequenceDiagram
participant Agent as Agent Logic
participant Client as ChatCompletionClient
participant Formatter as API Formatter
participant HTTP as HTTP Client
participant LLM_API as External LLM API
Agent->>+Client: create(messages, tools)
Client->>+Formatter: Format messages & tools for specific API (e.g., OpenAI JSON format)
Formatter-->>-Client: Return formatted request body
Client->>+HTTP: Send POST request to LLM API endpoint with formatted body & API Key
HTTP->>+LLM_API: Transmit request over network
LLM_API->>LLM_API: Process request, generate completion/function call
LLM_API-->>-HTTP: Return API response (e.g., JSON)
HTTP-->>-Client: Receive HTTP response
Client->>+Formatter: Parse API response (extract content, usage, finish_reason)
Formatter-->>-Client: Return parsed data
Client->>Client: Create standard CreateResult object
Client-->>-Agent: Return CreateResult
- Prepare: The
ChatCompletionClient
takes the standardLLMMessage
list andToolSchema
list. - Format: It translates these into the specific format required by the target LLM’s API (e.g., the JSON structure expected by OpenAI’s
/chat/completions
endpoint). This might involve renaming roles (likeSystemMessage
tosystem
), formatting tool descriptions, etc. - Request: It uses an underlying HTTP client to send a network request (usually a POST request) to the LLM service’s API endpoint, including the formatted data and authentication (like an API key).
- Wait & Receive: It waits for the LLM service to process the request and send back a response over the network.
- Parse: It receives the raw HTTP response (usually JSON) from the API.
- Standardize: It parses this specific API response, extracting the generated text or function calls, token usage figures, finish reason, etc.
- Return: It packages all this information into a standard
CreateResult
object and returns it to the calling agent code.
Code Glimpse:
-
ChatCompletionClient
Protocol (models/_model_client.py
): This is the abstract base class (or protocol) defining the contract that all specific clients must follow.# From: models/_model_client.py (Simplified ABC) from abc import ABC, abstractmethod from typing import Sequence, Optional, Mapping, Any, AsyncGenerator, Union from ._types import LLMMessage, CreateResult, RequestUsage from ..tools import Tool, ToolSchema from .. import CancellationToken class ChatCompletionClient(ABC): @abstractmethod async def create( self, messages: Sequence[LLMMessage], *, tools: Sequence[Tool | ToolSchema] = [], json_output: Optional[bool] = None, # Hint for JSON mode extra_create_args: Mapping[str, Any] = {}, # API-specific args cancellation_token: Optional[CancellationToken] = None, ) -> CreateResult: ... # The core method @abstractmethod def create_stream( self, # Similar to create, but yields results incrementally # ... parameters ... ) -> AsyncGenerator[Union[str, CreateResult], None]: ... @abstractmethod def total_usage(self) -> RequestUsage: ... # Get total tracked usage @abstractmethod def count_tokens(self, messages: Sequence[LLMMessage], *, tools: Sequence[Tool | ToolSchema] = []) -> int: ... # Estimate token count # Other methods like close(), actual_usage(), remaining_tokens(), model_info...
Concrete classes like
OpenAIChatCompletionClient
,AnthropicChatCompletionClient
etc., implement these methods using the specific libraries and API calls for each service. -
LLMMessage
Types (models/_types.py
): These define the structure of messages passed to the client.# From: models/_types.py (Simplified) from pydantic import BaseModel from typing import List, Union, Literal from .. import FunctionCall # From Chapter 4 context class SystemMessage(BaseModel): content: str type: Literal["SystemMessage"] = "SystemMessage" class UserMessage(BaseModel): content: Union[str, List[Union[str, Image]]] # Can include images! source: str type: Literal["UserMessage"] = "UserMessage" class AssistantMessage(BaseModel): content: Union[str, List[FunctionCall]] # Can be text or function calls source: str type: Literal["AssistantMessage"] = "AssistantMessage" # FunctionExecutionResultMessage also exists here...
-
CreateResult
(models/_types.py
): This defines the structure of the response from the client.# From: models/_types.py (Simplified) from pydantic import BaseModel from dataclasses import dataclass from typing import Union, List, Optional from .. import FunctionCall @dataclass class RequestUsage: prompt_tokens: int completion_tokens: int FinishReasons = Literal["stop", "length", "function_calls", "content_filter", "unknown"] class CreateResult(BaseModel): finish_reason: FinishReasons content: Union[str, List[FunctionCall]] # LLM output usage: RequestUsage # Token usage for this call cached: bool # Optional fields like logprobs, thought...
Using these standard types ensures that agent logic can work consistently, even if you switch the underlying LLM service by using a different
ChatCompletionClient
implementation.
Next Steps
You now understand the role of ChatCompletionClient
as the crucial link between AutoGen agents and the powerful capabilities of Large Language Models. It provides a standard way to send conversational history and tool definitions, receive generated text or function call requests, and track token usage.
Managing the conversation history (messages
) sent to the client is very important. How do you ensure the LLM has the right context, especially after tool calls have happened?
- Chapter 6: ChatCompletionContext: Learn how AutoGen helps manage the conversation history, including adding tool call requests and their results, before sending it to the
ChatCompletionClient
.
Generated by AI Codebase Knowledge Builder