Chapter 5: ChatCompletionClient - Talking to the Brains

So far, we’ve learned about:

Agents: The workers in our system.
Messaging: How agents communicate broadly.
AgentRuntime: The manager that runs the show.
Tools: How agents get specific skills.

But how does an agent actually think or generate text? Many powerful agents rely on Large Language Models (LLMs) – think of models like GPT-4, Claude, or Gemini – as their “brains”. How does an agent in AutoGen Core communicate with these external LLM services?

This is where the ChatCompletionClient comes in. It’s the dedicated component for talking to LLMs.

Motivation: Bridging the Gap to LLMs

Imagine you want to build an agent that can summarize long articles.

You give the agent an article (as a message).
The agent needs to send this article to an LLM (like GPT-4).
It also needs to tell the LLM: “Please summarize this.”
The LLM processes the request and generates a summary.
The agent needs to receive this summary back from the LLM.

How does the agent handle the technical details of connecting to the LLM’s specific API, formatting the request correctly, sending it over the internet, and understanding the response?

The ChatCompletionClient solves this! Think of it as the standard phone line and translator connecting your agent to the LLM service. You tell the client what to say (the conversation history and instructions), and it handles how to say it to the specific LLM and translates the LLM’s reply back into a standard format.

Key Concepts: Understanding the LLM Communicator

Let’s break down the ChatCompletionClient:

LLM Communication Bridge: It’s the primary way AutoGen agents interact with external LLM APIs (like OpenAI, Anthropic, Google Gemini, etc.). It hides the complexity of specific API calls.
Standard Interface (create method): It defines a common way to send requests and receive responses, regardless of the underlying LLM. The core method is create. You give it:
- messages: A list of messages representing the conversation history so far.
- Optional tools: A list of tools (Chapter 4) the LLM might be able to use.
- Other parameters (like json_output hints, cancellation_token).
Messages (LLMMessage): The conversation history is passed as a sequence of specific message types defined in autogen_core.models:
- SystemMessage: Instructions for the LLM (e.g., “You are a helpful assistant.”).
- UserMessage: Input from the user or another agent (e.g., the article text).
- AssistantMessage: Previous responses from the LLM (can include text or requests to call functions/tools).
- FunctionExecutionResultMessage: The results of executing a tool/function call.
Tools (ToolSchema): You can provide the schemas of available tools (Chapter 4). The LLM might then respond not with text, but with a request to call one of these tools (FunctionCall inside an AssistantMessage).
Response (CreateResult): The create method returns a standard CreateResult object containing:
- content: The LLM’s generated text or a list of FunctionCall requests.
- finish_reason: Why the LLM stopped generating (e.g., “stop”, “length”, “function_calls”).
- usage: How many input (prompt_tokens) and output (completion_tokens) tokens were used.
- cached: Whether the response came from a cache.
Token Tracking: The client automatically tracks token usage (prompt_tokens, completion_tokens) for each call. You can query the total usage via methods like total_usage(). This is vital for monitoring costs, as most LLM APIs charge based on tokens.

Use Case Example: Summarizing Text with an LLM

Let’s build a simplified scenario where we use a ChatCompletionClient to ask an LLM to summarize text.

Goal: Send text to an LLM via a client and get a summary back.

Step 1: Prepare the Input Messages

We need to structure our request as a list of LLMMessage objects.

# File: prepare_messages.py
from autogen_core.models import SystemMessage, UserMessage

# Instructions for the LLM
system_prompt = SystemMessage(
    content="You are a helpful assistant designed to summarize text concisely."
)

# The text we want to summarize
article_text = """
AutoGen is a framework that enables the development of LLM applications using multiple agents
that can converse with each other to solve tasks. AutoGen agents are customizable,
conversable, and can seamlessly allow human participation. They can operate in various modes
that employ combinations of LLMs, human inputs, and tools.
"""
user_request = UserMessage(
    content=f"Please summarize the following text in one sentence:\n\n{article_text}",
    source="User" # Indicate who provided this input
)

# Combine into a list for the client
messages_to_send = [system_prompt, user_request]

print("Messages prepared:")
for msg in messages_to_send:
    print(f"- {msg.type}: {msg.content[:50]}...") # Print first 50 chars

This code defines the instructions (SystemMessage) and the user’s request (UserMessage) and puts them in a list, ready to be sent.

Step 2: Use the ChatCompletionClient (Conceptual)

Now, we need an instance of a ChatCompletionClient. In a real application, you’d configure a specific client (like OpenAIChatCompletionClient with your API key). For this example, let’s imagine we have a pre-configured client called llm_client.

# File: call_llm_client.py
import asyncio
from autogen_core.models import CreateResult, RequestUsage
# Assume 'messages_to_send' is from the previous step
# Assume 'llm_client' is a pre-configured ChatCompletionClient instance
# (e.g., llm_client = OpenAIChatCompletionClient(config=...))

async def get_summary(client, messages):
    print("\nSending messages to LLM via ChatCompletionClient...")
    try:
        # The core call: send messages, get structured result
        response: CreateResult = await client.create(
            messages=messages,
            # We aren't providing tools in this simple example
            tools=[]
        )
        print("Received response:")
        print(f"- Finish Reason: {response.finish_reason}")
        print(f"- Content: {response.content}") # This should be the summary
        print(f"- Usage (Tokens): Prompt={response.usage.prompt_tokens}, Completion={response.usage.completion_tokens}")
        print(f"- Cached: {response.cached}")

        # Also, check total usage tracked by the client
        total_usage = client.total_usage()
        print(f"\nClient Total Usage: Prompt={total_usage.prompt_tokens}, Completion={total_usage.completion_tokens}")

    except Exception as e:
        print(f"An error occurred: {e}")

# --- Placeholder for actual client ---
class MockChatCompletionClient: # Simulate a real client
    _total_usage = RequestUsage(prompt_tokens=0, completion_tokens=0)
    async def create(self, messages, tools=[], **kwargs) -> CreateResult:
        # Simulate API call and response
        prompt_len = sum(len(str(m.content)) for m in messages) // 4 # Rough token estimate
        summary = "AutoGen is a multi-agent framework for developing LLM applications."
        completion_len = len(summary) // 4 # Rough token estimate
        usage = RequestUsage(prompt_tokens=prompt_len, completion_tokens=completion_len)
        self._total_usage.prompt_tokens += usage.prompt_tokens
        self._total_usage.completion_tokens += usage.completion_tokens
        return CreateResult(
            finish_reason="stop", content=summary, usage=usage, cached=False
        )
    def total_usage(self) -> RequestUsage: return self._total_usage
    # Other required methods (count_tokens, model_info etc.) omitted for brevity

async def main():
    from prepare_messages import messages_to_send # Get messages from previous step
    mock_client = MockChatCompletionClient()
    await get_summary(mock_client, messages_to_send)

# asyncio.run(main()) # If you run this, it uses the mock client

This code shows the essential client.create(...) call. We pass our messages_to_send and receive a CreateResult. We then print the summary (response.content) and the token usage reported for that specific call (response.usage) and the total tracked by the client (client.total_usage()).

How an Agent Uses It: Typically, an agent’s logic (e.g., inside its on_message handler) would:

Receive an incoming message (like the article to summarize).
Prepare the list of LLMMessage objects (including system prompts, history, and the new request).
Access a ChatCompletionClient instance (often provided during agent setup or accessed via its context).
Call await client.create(...).
Process the CreateResult (e.g., extract the summary text, check for function calls if tools were provided).
Potentially send the result as a new message to another agent or return it.

Under the Hood: How the Client Talks to the LLM

What happens when you call await client.create(...)?

Conceptual Flow:

sequenceDiagram
    participant Agent as Agent Logic
    participant Client as ChatCompletionClient
    participant Formatter as API Formatter
    participant HTTP as HTTP Client
    participant LLM_API as External LLM API

    Agent->>+Client: create(messages, tools)
    Client->>+Formatter: Format messages & tools for specific API (e.g., OpenAI JSON format)
    Formatter-->>-Client: Return formatted request body
    Client->>+HTTP: Send POST request to LLM API endpoint with formatted body & API Key
    HTTP->>+LLM_API: Transmit request over network
    LLM_API->>LLM_API: Process request, generate completion/function call
    LLM_API-->>-HTTP: Return API response (e.g., JSON)
    HTTP-->>-Client: Receive HTTP response
    Client->>+Formatter: Parse API response (extract content, usage, finish_reason)
    Formatter-->>-Client: Return parsed data
    Client->>Client: Create standard CreateResult object
    Client-->>-Agent: Return CreateResult

Prepare: The ChatCompletionClient takes the standard LLMMessage list and ToolSchema list.
Format: It translates these into the specific format required by the target LLM’s API (e.g., the JSON structure expected by OpenAI’s /chat/completions endpoint). This might involve renaming roles (like SystemMessage to system), formatting tool descriptions, etc.
Request: It uses an underlying HTTP client to send a network request (usually a POST request) to the LLM service’s API endpoint, including the formatted data and authentication (like an API key).
Wait & Receive: It waits for the LLM service to process the request and send back a response over the network.
Parse: It receives the raw HTTP response (usually JSON) from the API.
Standardize: It parses this specific API response, extracting the generated text or function calls, token usage figures, finish reason, etc.
Return: It packages all this information into a standard CreateResult object and returns it to the calling agent code.

Code Glimpse:

ChatCompletionClient Protocol (models/_model_client.py): This is the abstract base class (or protocol) defining the contract that all specific clients must follow.

# From: models/_model_client.py (Simplified ABC)
from abc import ABC, abstractmethod
from typing import Sequence, Optional, Mapping, Any, AsyncGenerator, Union
from ._types import LLMMessage, CreateResult, RequestUsage
from ..tools import Tool, ToolSchema
from .. import CancellationToken

class ChatCompletionClient(ABC):
    @abstractmethod
    async def create(
        self, messages: Sequence[LLMMessage], *,
        tools: Sequence[Tool | ToolSchema] = [],
        json_output: Optional[bool] = None, # Hint for JSON mode
        extra_create_args: Mapping[str, Any] = {}, # API-specific args
        cancellation_token: Optional[CancellationToken] = None,
    ) -> CreateResult: ... # The core method

    @abstractmethod
    def create_stream(
        self, # Similar to create, but yields results incrementally
        # ... parameters ...
    ) -> AsyncGenerator[Union[str, CreateResult], None]: ...

    @abstractmethod
    def total_usage(self) -> RequestUsage: ... # Get total tracked usage

    @abstractmethod
    def count_tokens(self, messages: Sequence[LLMMessage], *, tools: Sequence[Tool | ToolSchema] = []) -> int: ... # Estimate token count

    # Other methods like close(), actual_usage(), remaining_tokens(), model_info...

Concrete classes like OpenAIChatCompletionClient, AnthropicChatCompletionClient etc., implement these methods using the specific libraries and API calls for each service.

LLMMessage Types (models/_types.py): These define the structure of messages passed to the client.

# From: models/_types.py (Simplified)
from pydantic import BaseModel
from typing import List, Union, Literal
from .. import FunctionCall # From Chapter 4 context

class SystemMessage(BaseModel):
    content: str
    type: Literal["SystemMessage"] = "SystemMessage"

class UserMessage(BaseModel):
    content: Union[str, List[Union[str, Image]]] # Can include images!
    source: str
    type: Literal["UserMessage"] = "UserMessage"

class AssistantMessage(BaseModel):
    content: Union[str, List[FunctionCall]] # Can be text or function calls
    source: str
    type: Literal["AssistantMessage"] = "AssistantMessage"

# FunctionExecutionResultMessage also exists here...

CreateResult (models/_types.py): This defines the structure of the response from the client.

# From: models/_types.py (Simplified)
from pydantic import BaseModel
from dataclasses import dataclass
from typing import Union, List, Optional
from .. import FunctionCall

@dataclass
class RequestUsage:
    prompt_tokens: int
    completion_tokens: int

FinishReasons = Literal["stop", "length", "function_calls", "content_filter", "unknown"]

class CreateResult(BaseModel):
    finish_reason: FinishReasons
    content: Union[str, List[FunctionCall]] # LLM output
    usage: RequestUsage # Token usage for this call
    cached: bool
    # Optional fields like logprobs, thought...

Using these standard types ensures that agent logic can work consistently, even if you switch the underlying LLM service by using a different ChatCompletionClient implementation.

Next Steps

You now understand the role of ChatCompletionClient as the crucial link between AutoGen agents and the powerful capabilities of Large Language Models. It provides a standard way to send conversational history and tool definitions, receive generated text or function call requests, and track token usage.

Managing the conversation history (messages) sent to the client is very important. How do you ensure the LLM has the right context, especially after tool calls have happened?

Chapter 6: ChatCompletionContext: Learn how AutoGen helps manage the conversation history, including adding tool call requests and their results, before sending it to the ChatCompletionClient.

Generated by AI Codebase Knowledge Builder