Chapter 1: Graph / StateGraph - The Blueprint of Your Application

Welcome to the LangGraph tutorial! We’re excited to help you learn how to build powerful, stateful applications with Large Language Models (LLMs).

Imagine you’re building an application, maybe a chatbot, an agent that performs tasks, or something that processes data in multiple steps. As these applications get more complex, just calling an LLM once isn’t enough. You need a way to structure the flow – maybe call an LLM, then a tool, then another LLM based on the result. How do you manage this sequence of steps and the information passed between them?

That’s where Graphs come in!

What Problem Do Graphs Solve?

Think of a complex task like baking a cake. You don’t just throw all the ingredients in the oven. There’s a sequence: mix dry ingredients, mix wet ingredients, combine them, pour into a pan, bake, cool, frost. Each step depends on the previous one.

LangGraph helps you define these steps and the order they should happen in. It provides a way to create a flowchart or a blueprint for your application’s logic.

The core idea is to break down your application into:

Nodes: These are the individual steps or actions (like “mix dry ingredients” or “call the LLM”).
Edges: These are the connections or transitions between the steps, defining the order (after mixing dry ingredients, mix wet ingredients).

LangGraph provides different types of graphs, but the most common and useful one for building stateful applications is the StateGraph.

Core Concepts: `Graph`, `StateGraph`, and `MessageGraph`

Let’s look at the main types of graphs you’ll encounter:

Graph (The Basic Blueprint)

This is the most fundamental type. You define nodes (steps) and edges (connections).
It’s like a basic flowchart diagram.
You explicitly define how information passes from one node to the next.
While foundational, you’ll often use the more specialized StateGraph for convenience.

# This is a conceptual example - we usually use StateGraph
from langgraph.graph import Graph

# Define simple functions or Runnables as nodes
def step_one(input_data):
    print("Running Step 1")
    return input_data * 2

def step_two(processed_data):
    print("Running Step 2")
    return processed_data + 5

# Create a basic graph
basic_graph_builder = Graph()

# Add nodes
basic_graph_builder.add_node("A", step_one)
basic_graph_builder.add_node("B", step_two)

# Add edges (connections)
basic_graph_builder.add_edge("A", "B") # Run B after A
basic_graph_builder.set_entry_point("A") # Start at A
# basic_graph_builder.set_finish_point("B") # Not needed for this simple Graph type

StateGraph (The Collaborative Whiteboard)
- This is the workhorse for most LangGraph applications. It’s a specialized Graph.
- Key Idea: Nodes communicate implicitly by reading from and writing to a shared State object.
- Analogy: Imagine a central whiteboard (the State). Each node (person) can read what’s on the whiteboard, do some work, and then update the whiteboard with new information or changes.
- You define the structure of this shared state first (e.g., what keys it holds).
- Each node receives the current state and returns a dictionary containing only the parts of the state it wants to update. LangGraph handles merging these updates into the main state.
MessageGraph (The Chatbot Specialist)
- This is a further specialization of StateGraph, designed specifically for building chatbots or conversational agents.
- It automatically manages a messages list within its state.
- Nodes typically take the current list of messages and return new messages to be added.
- It uses a special function (add_messages) to append messages while handling potential duplicates or updates based on message IDs. This makes building chat flows much simpler.

For the rest of this chapter, we’ll focus on StateGraph as it introduces the core concepts most clearly.

Building a Simple `StateGraph`

Let’s build a tiny application that takes a number, adds 1 to it, and then multiplies it by 2.

Step 1: Define the State

First, we define the “whiteboard” – the structure of the data our graph will work with. We use Python’s TypedDict for this.

from typing import TypedDict

class MyState(TypedDict):
    # Our state will hold a single number called 'value'
    value: int

This tells our StateGraph that the shared information will always contain an integer named value.

Step 2: Define the Nodes

Nodes are functions (or LangChain Runnables) that perform the work. They take the current State as input and return a dictionary containing the updates to the state.

# Node 1: Adds 1 to the value
def add_one(state: MyState) -> dict:
    print("--- Running Adder Node ---")
    current_value = state['value']
    new_value = current_value + 1
    print(f"Input value: {current_value}, Output value: {new_value}")
    # Return *only* the key we want to update
    return {"value": new_value}

# Node 2: Multiplies the value by 2
def multiply_by_two(state: MyState) -> dict:
    print("--- Running Multiplier Node ---")
    current_value = state['value']
    new_value = current_value * 2
    print(f"Input value: {current_value}, Output value: {new_value}")
    # Return the update
    return {"value": new_value}

Notice how each function takes state and returns a dict specifying which part of the state ("value") should be updated and with what new value.

Step 3: Create the Graph and Add Nodes/Edges

Now we assemble our blueprint using StateGraph.

from langgraph.graph import StateGraph, END, START

# Create a StateGraph instance linked to our state definition
workflow = StateGraph(MyState)

# Add the nodes to the graph
workflow.add_node("adder", add_one)
workflow.add_node("multiplier", multiply_by_two)

# Set the entry point --> where does the flow start?
workflow.set_entry_point("adder")

# Add edges --> how do the nodes connect?
workflow.add_edge("adder", "multiplier") # After adder, run multiplier

# Set the finish point --> where does the flow end?
# We use the special identifier END
workflow.add_edge("multiplier", END)

StateGraph(MyState): Creates the graph, telling it to use our MyState structure.
add_node("name", function): Registers our functions as steps in the graph with unique names.
set_entry_point("adder"): Specifies that the adder node should run first. This implicitly creates an edge from a special START point to adder.
add_edge("adder", "multiplier"): Creates a connection. After adder finishes, multiplier will run.
add_edge("multiplier", END): Specifies that after multiplier finishes, the graph execution should stop. END is a special marker for the graph’s conclusion.

Step 4: Compile the Graph

Before we can run it, we need to compile the graph. This finalizes the structure and makes it executable.

# Compile the workflow into an executable object
app = workflow.compile()

Step 5: Run It!

Now we can invoke our compiled graph (app) with some initial state.

# Define the initial state
initial_state = {"value": 5}

# Run the graph
final_state = app.invoke(initial_state)

# Print the final result
print("\n--- Final State ---")
print(final_state)

Expected Output:

--- Running Adder Node ---
Input value: 5, Output value: 6
--- Running Multiplier Node ---
Input value: 6, Output value: 12

--- Final State ---
{'value': 12}

As you can see, the graph executed the nodes in the defined order (adder then multiplier), automatically passing the updated state between them!

How Does `StateGraph` Work Under the Hood?

You defined the nodes and edges, but what actually happens when you call invoke()?

Initialization: LangGraph takes your initial input ({"value": 5}) and puts it onto the “whiteboard” (the internal state).
Execution Engine: A powerful internal component called the Pregel Execution Engine takes over. It looks at the current state and the graph structure.
Following Edges: It starts at the START node and follows the edge to the entry point (adder).
Node Execution: It runs the adder function, passing it the current state ({"value": 5}).
State Update: The adder function returns {"value": 6}. The Pregel engine uses special mechanisms called Channels to update the value associated with the "value" key on the “whiteboard”. The state is now {"value": 6}.
Next Step: The engine sees the edge from adder to multiplier.
Node Execution: It runs the multiplier function, passing it the updated state ({"value": 6}).
State Update: multiplier returns {"value": 12}. The engine updates the state again via the Channels. The state is now {"value": 12}.
Following Edges: The engine sees the edge from multiplier to END.
Finish: Reaching END signals the execution is complete. The final state ({"value": 12}) is returned.

Here’s a simplified visual:

sequenceDiagram
    participant User
    participant App (CompiledGraph)
    participant State
    participant AdderNode as adder
    participant MultiplierNode as multiplier

    User->>App: invoke({"value": 5})
    App->>State: Initialize state = {"value": 5}
    App->>AdderNode: Execute(state)
    AdderNode->>State: Read value (5)
    AdderNode-->>App: Return {"value": 6}
    App->>State: Update state = {"value": 6}
    App->>MultiplierNode: Execute(state)
    MultiplierNode->>State: Read value (6)
    MultiplierNode-->>App: Return {"value": 12}
    App->>State: Update state = {"value": 12}
    App->>User: Return final state {"value": 12}

Don’t worry too much about the details of Pregel or Channels yet – we’ll cover them in later chapters. The key takeaway is that StateGraph manages the state and orchestrates the execution based on your defined nodes and edges.

A Peek at the Code (`graph/state.py`, `graph/graph.py`)

Let’s briefly look at the code snippets provided to see how these concepts map to the implementation:

StateGraph.__init__ (graph/state.py):

# Simplified view
class StateGraph(Graph):
    def __init__(self, state_schema: Optional[Type[Any]] = None, ...):
        super().__init__()
        # ... stores the state_schema ...
        self.schema = state_schema
        # ... analyzes the schema to understand state keys and how to update them ...
        self._add_schema(state_schema)
        # ... sets up internal dictionaries for channels, nodes etc. ...

This code initializes the graph, crucially storing the state_schema you provide. It analyzes this schema to figure out the “keys” on your whiteboard (like "value") and sets up the internal structures (Channels) needed to manage updates to each key.

StateGraph.add_node (graph/state.py):

# Simplified view
def add_node(self, node: str, action: RunnableLike, ...):
    # ... basic checks for name conflicts, reserved names (START, END) ...
    if node in self.channels: # Cannot use a state key name as a node name
         raise ValueError(...)
    # ... wrap the provided action (function/runnable) ...
    runnable = coerce_to_runnable(action, ...)
    # ... store the node details (runnable, input type etc.) ...
    self.nodes[node] = StateNodeSpec(runnable, ..., input=input or self.schema, ...)
    return self

When you add a node, it stores the associated function (action) and links it to the provided node name. It also figures out what input schema the node expects (usually the main graph state schema).

Graph.add_edge (graph/graph.py):

# Simplified view from the base Graph class
def add_edge(self, start_key: str, end_key: str):
    # ... checks for invalid edges (e.g., starting from END) ...
    # ... basic validation ...
    # Stores the connection as a simple pair
    self.edges.add((start_key, end_key))
    return self

Adding an edge is relatively simple – it just records the (start_key, end_key) pair in a set, representing the connection.

StateGraph.compile (graph/state.py):

# Simplified view
def compile(self, ...):
    # ... validation checks ...
    self.validate(...)
    # ... create the CompiledStateGraph instance ...
    compiled = CompiledStateGraph(builder=self, ...)
    # ... add nodes, edges, branches to the compiled version ...
    for key, node in self.nodes.items():
        compiled.attach_node(key, node)
    for start, end in self.edges:
        compiled.attach_edge(start, end)
    # ... more setup for branches, entry/exit points ...
    # ... finalize and return the compiled graph ...
    return compiled.validate()

Compilation takes your defined nodes and edges and builds the final, executable CompiledStateGraph. It sets up the internal machinery (Pregel, Channels) based on your blueprint.

Conclusion

You’ve learned the fundamental concept in LangGraph: the Graph.

Graphs define the structure and flow of your application using Nodes (steps) and Edges (connections).
StateGraph is the most common type, where nodes communicate implicitly by reading and updating a shared State object (like a whiteboard).
MessageGraph is a specialized StateGraph for easily building chatbots.
You define the state structure, write node functions that update parts of the state, connect them with edges, and compile the graph to make it runnable.

Now that you understand how to define the overall structure of your application using StateGraph, the next step is to dive deeper into what constitutes a Node.

Let’s move on to Chapter 2: Nodes (PregelNode) to explore how individual steps are defined and executed.

Generated by AI Codebase Knowledge Builder

Chapter 1: Graph / StateGraph - The Blueprint of Your Application

What Problem Do Graphs Solve?

Core Concepts: Graph, StateGraph, and MessageGraph

Building a Simple StateGraph

How Does StateGraph Work Under the Hood?

A Peek at the Code (graph/state.py, graph/graph.py)

Conclusion

Core Concepts: `Graph`, `StateGraph`, and `MessageGraph`

Building a Simple `StateGraph`

How Does `StateGraph` Work Under the Hood?

A Peek at the Code (`graph/state.py`, `graph/graph.py`)