流式传输

LangGraph实现了一个流式系统以呈现实时更新。流式传输对于增强基于LLM的应用程序的响应性至关重要。通过逐步显示输出，甚至在完整响应准备就绪之前，流式传输显著提升了用户体验（UX），尤其是在处理LLM的延迟时。 LangGraph流式传输可以实现什么：

流图状态 — 使用 updates 和 values 模式获取状态更新/值。
流子图输出 — 包含父图和任何嵌套子图的输出。
流 LLM 令牌 — 从任何地方捕获令牌流：节点内部、子图或工具中。
流自定义数据 — 直接从工具函数发送自定义更新或进度信号。
使用多种流模式 — 从 values（完整状态）、updates（状态增量）、messages（LLM 令牌 + 元数据）、custom（任意用户数据）或 debug（详细跟踪）中选择。

支持的流模式

将以下流模式之一或多个作为列表传递给 stream() 或 astream() 方法：

模式	描述
`values`	在图的每一步之后流式传输状态的完整值。
`updates`	在图的每一步之后流式传输状态更新。如果在同一步骤中进行了多次更新（例如，运行了多个节点），则这些更新将分别流式传输。
`custom`	从您的图节点内部流式传输自定义数据。
`messages`	从任何调用LLM的图节点流式传输2元组（LLM令牌，元数据）。
`debug`	在图的执行过程中尽可能流式传输尽可能多的信息。

基本用法示例

LangGraph 图暴露了 .stream()（同步）和 .astream()（异步）方法以生成迭代器形式的流式输出。

for chunk in graph.stream(inputs, stream_mode="updates"):
    print(chunk)

Extended example: streaming updates

from typing import TypedDict
from langgraph.graph import StateGraph, START, END

class State(TypedDict):
    topic: str
    joke: str

def refine_topic(state: State):
    return {"topic": state["topic"] + " and cats"}

def generate_joke(state: State):
    return {"joke": f"This is a joke about {state['topic']}"}

graph = (
    StateGraph(State)
    .add_node(refine_topic)
    .add_node(generate_joke)
    .add_edge(START, "refine_topic")
    .add_edge("refine_topic", "generate_joke")
    .add_edge("generate_joke", END)
    .compile()
)

# The stream() method returns an iterator that yields streamed outputs
for chunk in graph.stream(  
    {"topic": "ice cream"},
    # Set stream_mode="updates" to stream only the updates to the graph state after each node
    # Other stream modes are also available. See supported stream modes for details
    stream_mode="updates",  
):
    print(chunk)

{'refineTopic': {'topic': 'ice cream and cats'}}
{'generateJoke': {'joke': 'This is a joke about ice cream and cats'}}

流式传输多种模式

您可以将列表作为 stream_mode 参数传递，以同时流式传输多个模式。流出的输出将是包含 (mode, chunk) 的元组，其中 mode 是流模式的名称，而 chunk 是该模式流出的数据。

for mode, chunk in graph.stream(inputs, stream_mode=["updates", "custom"]):
    print(chunk)

流图状态

使用 updates 和 values 流模式来流式传输图在执行过程中的状态。

updates 在每一步的图之后流式传输 状态更新。
values 在每一步的图之后流式传输 状态的完整值。

from typing import TypedDict
from langgraph.graph import StateGraph, START, END


class State(TypedDict):
  topic: str
  joke: str


def refine_topic(state: State):
    return {"topic": state["topic"] + " and cats"}


def generate_joke(state: State):
    return {"joke": f"This is a joke about {state['topic']}"}

graph = (
  StateGraph(State)
  .add_node(refine_topic)
  .add_node(generate_joke)
  .add_edge(START, "refine_topic")
  .add_edge("refine_topic", "generate_joke")
  .add_edge("generate_joke", END)
  .compile()
)

updates
values

使用此功能仅流式传输节点在每一步返回的状态更新。流式传输的输出包括节点的名称以及更新内容。

for chunk in graph.stream(
    {"topic": "ice cream"},
    stream_mode="updates",  
):
    print(chunk)

使用此功能以流式传输每一步后图的完整状态。

for chunk in graph.stream(
    {"topic": "ice cream"},
    stream_mode="values",  
):
    print(chunk)

流子图输出

要将子图的输出包含在流式输出中，您可以在父图的 .stream() 方法中设置 subgraphs=True。这将流式传输来自父图和任何子图的输出。输出将以元组的形式流式传输 (namespace, data)，其中 namespace 是一个包含子图调用节点路径的元组，例如 ("parent_node:<task_id>", "child_node:<task_id>")。

for chunk in graph.stream(
    {"foo": "foo"},
    # Set subgraphs=True to stream outputs from subgraphs
    subgraphs=True,  
    stream_mode="updates",
):
    print(chunk)

Extended example: streaming from subgraphs

from langgraph.graph import START, StateGraph
from typing import TypedDict

# Define subgraph
class SubgraphState(TypedDict):
    foo: str  # note that this key is shared with the parent graph state
    bar: str

def subgraph_node_1(state: SubgraphState):
    return {"bar": "bar"}

def subgraph_node_2(state: SubgraphState):
    return {"foo": state["foo"] + state["bar"]}

subgraph_builder = StateGraph(SubgraphState)
subgraph_builder.add_node(subgraph_node_1)
subgraph_builder.add_node(subgraph_node_2)
subgraph_builder.add_edge(START, "subgraph_node_1")
subgraph_builder.add_edge("subgraph_node_1", "subgraph_node_2")
subgraph = subgraph_builder.compile()

# Define parent graph
class ParentState(TypedDict):
    foo: str

def node_1(state: ParentState):
    return {"foo": "hi! " + state["foo"]}

builder = StateGraph(ParentState)
builder.add_node("node_1", node_1)
builder.add_node("node_2", subgraph)
builder.add_edge(START, "node_1")
builder.add_edge("node_1", "node_2")
graph = builder.compile()

for chunk in graph.stream(
    {"foo": "foo"},
    stream_mode="updates",
    # Set subgraphs=True to stream outputs from subgraphs
    subgraphs=True,  
):
    print(chunk)

((), {'node_1': {'foo': 'hi! foo'}})
(('node_2:dfddc4ba-c3c5-6887-5012-a243b5b377c2',), {'subgraph_node_1': {'bar': 'bar'}})
(('node_2:dfddc4ba-c3c5-6887-5012-a243b5b377c2',), {'subgraph_node_2': {'foo': 'hi! foobar'}})
((), {'node_2': {'foo': 'hi! foobar'}})

注意：我们接收的不仅仅是节点更新，还包括命名空间，这些命名空间告诉我们我们从哪个图（或子图）进行流式传输。

调试

使用 debug 流式模式，在整个图执行过程中尽可能多地传输信息。流出的输出包括节点名称以及完整状态。

for chunk in graph.stream(
    {"topic": "ice cream"},
    stream_mode="debug",  
):
    print(chunk)

大型语言模型令牌

使用 messages 流式模式从您的图中的任何部分（包括节点、工具、子图或任务）逐个标记地流式传输大型语言模型（LLM）的输出。流式输出的结果来自 messages 模式，是一个元组 (message_chunk, metadata)，其中：

message_chunk：来自LLM的标记或消息段。
metadata：包含关于图节点和LLM调用的详细信息的字典。

如果您的高级语言模型（LLM）无法作为LangChain集成使用，您可以使用 custom 模式来流式传输其输出。有关详细信息，请参阅与任何LLM一起使用。

Python < 3.11 中异步操作需要手动配置 当使用 Python < 3.11 并编写异步代码时，您必须显式地将 RunnableConfig 传递给 ainvoke() 以启用正确的流式传输。有关详细信息，请参阅 Python < 3.11 中的异步操作或升级到 Python 3.11+。

from dataclasses import dataclass

from langchain.chat_models import init_chat_model
from langgraph.graph import StateGraph, START


@dataclass
class MyState:
    topic: str
    joke: str = ""


model = init_chat_model(model="openai:gpt-4o-mini")

def call_model(state: MyState):
    """Call the LLM to generate a joke about a topic"""
    # Note that message events are emitted even when the LLM is run using .invoke rather than .stream
    model_response = model.invoke(  
        [
            {"role": "user", "content": f"Generate a joke about {state.topic}"}
        ]
    )
    return {"joke": model_response.content}

graph = (
    StateGraph(MyState)
    .add_node(call_model)
    .add_edge(START, "call_model")
    .compile()
)

# The "messages" stream mode returns an iterator of tuples (message_chunk, metadata)
# where message_chunk is the token streamed by the LLM and metadata is a dictionary
# with information about the graph node where the LLM was called and other information
for message_chunk, metadata in graph.stream(
    {"topic": "ice cream"},
    stream_mode="messages",  
):
    if message_chunk.content:
        print(message_chunk.content, end="|", flush=True)

通过LLM调用进行筛选

您可以关联 tags 与 LLM 调用来通过 LLM 调用过滤流式传输的标记。

from langchain.chat_models import init_chat_model

# model_1 is tagged with "joke"
model_1 = init_chat_model(model="openai:gpt-4o-mini", tags=['joke'])
# model_2 is tagged with "poem"
model_2 = init_chat_model(model="openai:gpt-4o-mini", tags=['poem'])

graph = ... # define a graph that uses these LLMs

# The stream_mode is set to "messages" to stream LLM tokens
# The metadata contains information about the LLM invocation, including the tags
async for msg, metadata in graph.astream(
    {"topic": "cats"},
    stream_mode="messages",  
):
    # Filter the streamed tokens by the tags field in the metadata to only include
    # the tokens from the LLM invocation with the "joke" tag
    if metadata["tags"] == ["joke"]:
        print(msg.content, end="|", flush=True)

Extended example: filtering by tags

from typing import TypedDict

from langchain.chat_models import init_chat_model
from langgraph.graph import START, StateGraph

# The joke_model is tagged with "joke"
joke_model = init_chat_model(model="openai:gpt-4o-mini", tags=["joke"])
# The poem_model is tagged with "poem"
poem_model = init_chat_model(model="openai:gpt-4o-mini", tags=["poem"])


class State(TypedDict):
      topic: str
      joke: str
      poem: str


async def call_model(state, config):
      topic = state["topic"]
      print("Writing joke...")
      # Note: Passing the config through explicitly is required for python < 3.11
      # Since context var support wasn't added before then: https://docs.python.org/3/library/asyncio-task.html#creating-tasks
      # The config is passed through explicitly to ensure the context vars are propagated correctly
      # This is required for Python < 3.11 when using async code. Please see the async section for more details
      joke_response = await joke_model.ainvoke(
            [{"role": "user", "content": f"Write a joke about {topic}"}],
            config,
      )
      print("\n\nWriting poem...")
      poem_response = await poem_model.ainvoke(
            [{"role": "user", "content": f"Write a short poem about {topic}"}],
            config,
      )
      return {"joke": joke_response.content, "poem": poem_response.content}


graph = (
      StateGraph(State)
      .add_node(call_model)
      .add_edge(START, "call_model")
      .compile()
)

# The stream_mode is set to "messages" to stream LLM tokens
# The metadata contains information about the LLM invocation, including the tags
async for msg, metadata in graph.astream(
      {"topic": "cats"},
      stream_mode="messages",
):
    if metadata["tags"] == ["joke"]:
        print(msg.content, end="|", flush=True)

按节点筛选

仅从特定节点流式传输令牌时，请使用 stream_mode="messages" 并通过流式元数据中的 langgraph_node 字段过滤输出：

# The "messages" stream mode returns a tuple of (message_chunk, metadata)
# where message_chunk is the token streamed by the LLM and metadata is a dictionary
# with information about the graph node where the LLM was called and other information
for msg, metadata in graph.stream(
    inputs,
    stream_mode="messages",  
):
    # Filter the streamed tokens by the langgraph_node field in the metadata
    # to only include the tokens from the specified node
    if msg.content and metadata["langgraph_node"] == "some_node_name":
        ...

Extended example: streaming LLM tokens from specific nodes

from typing import TypedDict
from langgraph.graph import START, StateGraph
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o-mini")


class State(TypedDict):
      topic: str
      joke: str
      poem: str


def write_joke(state: State):
      topic = state["topic"]
      joke_response = model.invoke(
            [{"role": "user", "content": f"Write a joke about {topic}"}]
      )
      return {"joke": joke_response.content}


def write_poem(state: State):
      topic = state["topic"]
      poem_response = model.invoke(
            [{"role": "user", "content": f"Write a short poem about {topic}"}]
      )
      return {"poem": poem_response.content}


graph = (
      StateGraph(State)
      .add_node(write_joke)
      .add_node(write_poem)
      # write both the joke and the poem concurrently
      .add_edge(START, "write_joke")
      .add_edge(START, "write_poem")
      .compile()
)

# The "messages" stream mode returns a tuple of (message_chunk, metadata)
# where message_chunk is the token streamed by the LLM and metadata is a dictionary
# with information about the graph node where the LLM was called and other information
for msg, metadata in graph.stream(
    {"topic": "cats"},
    stream_mode="messages",  
):
    # Filter the streamed tokens by the langgraph_node field in the metadata
    # to only include the tokens from the write_poem node
    if msg.content and metadata["langgraph_node"] == "write_poem":
        print(msg.content, end="|", flush=True)

流式自定义数据

要从LangGraph节点或工具内部发送自定义用户定义数据，请按照以下步骤操作：

使用get_stream_writer访问流写入器并发出自定义数据。
在调用stream_mode="custom"或.stream()时设置.astream()以获取流中的自定义数据。您可以组合多个模式（例如，["updates", "custom"]），但至少必须有一个是"custom"。

在 Python < 3.11 的异步代码中不存在 get_stream_writer 在 Python < 3.11 运行的异步代码中，get_stream_writer 将不会工作。相反，请将一个 writer 参数添加到您的节点或工具中，并手动传递。有关使用示例，请参阅 Async with Python < 3.11。

node
tool

from typing import TypedDict
from langgraph.config import get_stream_writer
from langgraph.graph import StateGraph, START

class State(TypedDict):
    query: str
    answer: str

def node(state: State):
    # Get the stream writer to send custom data
    writer = get_stream_writer()
    # Emit a custom key-value pair (e.g., progress update)
    writer({"custom_key": "Generating custom data inside node"})
    return {"answer": "some data"}

graph = (
    StateGraph(State)
    .add_node(node)
    .add_edge(START, "node")
    .compile()
)

inputs = {"query": "example"}

# Set stream_mode="custom" to receive the custom data in the stream
for chunk in graph.stream(inputs, stream_mode="custom"):
    print(chunk)

from langchain.tools import tool
from langgraph.config import get_stream_writer

@tool
def query_database(query: str) -> str:
    """Query the database."""
    # Access the stream writer to send custom data
    writer = get_stream_writer()  
    # Emit a custom key-value pair (e.g., progress update)
    writer({"data": "Retrieved 0/100 records", "type": "progress"})  
    # perform query
    # Emit another custom key-value pair
    writer({"data": "Retrieved 100/100 records", "type": "progress"})
    return "some-answer"


graph = ... # define a graph that uses this tool

# Set stream_mode="custom" to receive the custom data in the stream
for chunk in graph.stream(inputs, stream_mode="custom"):
    print(chunk)

与任何大型语言模型（LLM）一起使用

您可以使用 stream_mode="custom" 从 任何 LLM API 流数据 —— 即使该 API 没有实现 LangChain 聊天模型接口。这使得您能够集成原始的LLM客户端或提供自身流式接口的外部服务，使LangGraph在定制设置中具有高度灵活性。

from langgraph.config import get_stream_writer

def call_arbitrary_model(state):
    """Example node that calls an arbitrary model and streams the output"""
    # Get the stream writer to send custom data
    writer = get_stream_writer()  
    # Assume you have a streaming client that yields chunks
    # Generate LLM tokens using your custom streaming client
    for chunk in your_custom_streaming_client(state["topic"]):
        # Use the writer to send custom data to the stream
        writer({"custom_llm_chunk": chunk})  
    return {"result": "completed"}

graph = (
    StateGraph(State)
    .add_node(call_arbitrary_model)
    # Add other nodes and edges as needed
    .compile()
)
# Set stream_mode="custom" to receive the custom data in the stream
for chunk in graph.stream(
    {"topic": "cats"},
    stream_mode="custom",  

):
    # The chunk will contain the custom data streamed from the llm
    print(chunk)

Extended example: streaming arbitrary chat model

import operator
import json

from typing import TypedDict
from typing_extensions import Annotated
from langgraph.graph import StateGraph, START

from openai import AsyncOpenAI

openai_client = AsyncOpenAI()
model_name = "gpt-4o-mini"


async def stream_tokens(model_name: str, messages: list[dict]):
    response = await openai_client.chat.completions.create(
        messages=messages, model=model_name, stream=True
    )
    role = None
    async for chunk in response:
        delta = chunk.choices[0].delta

        if delta.role is not None:
            role = delta.role

        if delta.content:
            yield {"role": role, "content": delta.content}


# this is our tool
async def get_items(place: str) -> str:
    """Use this tool to list items one might find in a place you're asked about."""
    writer = get_stream_writer()
    response = ""
    async for msg_chunk in stream_tokens(
        model_name,
        [
            {
                "role": "user",
                "content": (
                    "Can you tell me what kind of items "
                    f"i might find in the following place: '{place}'. "
                    "List at least 3 such items separating them by a comma. "
                    "And include a brief description of each item."
                ),
            }
        ],
    ):
        response += msg_chunk["content"]
        writer(msg_chunk)

    return response


class State(TypedDict):
    messages: Annotated[list[dict], operator.add]


# this is the tool-calling graph node
async def call_tool(state: State):
    ai_message = state["messages"][-1]
    tool_call = ai_message["tool_calls"][-1]

    function_name = tool_call["function"]["name"]
    if function_name != "get_items":
        raise ValueError(f"Tool {function_name} not supported")

    function_arguments = tool_call["function"]["arguments"]
    arguments = json.loads(function_arguments)

    function_response = await get_items(**arguments)
    tool_message = {
        "tool_call_id": tool_call["id"],
        "role": "tool",
        "name": function_name,
        "content": function_response,
    }
    return {"messages": [tool_message]}


graph = (
    StateGraph(State)
    .add_node(call_tool)
    .add_edge(START, "call_tool")
    .compile()
)

让我们使用一个包含工具调用的 AIMessage 来调用图。

inputs = {
    "messages": [
        {
            "content": None,
            "role": "assistant",
            "tool_calls": [
                {
                    "id": "1",
                    "function": {
                        "arguments": '{"place":"bedroom"}',
                        "name": "get_items",
                    },
                    "type": "function",
                }
            ],
        }
    ]
}

async for chunk in graph.astream(
    inputs,
    stream_mode="custom",
):
    print(chunk["content"], end="|", flush=True)

禁用特定聊天模型的流式传输

如果您的应用程序混合了支持流式传输和不支持流式传输的模型，您可能需要显式禁用不支持流式传输的模型的流式传输功能。在初始化模型时设置 disable_streaming=True。

init_chat_model
chat model interface

from langchain.chat_models import init_chat_model

model = init_chat_model(
    "anthropic:claude-sonnet-4-5",
    # Set disable_streaming=True to disable streaming for the chat model
    disable_streaming=True

)

from langchain_openai import ChatOpenAI

# Set disable_streaming=True to disable streaming for the chat model
model = ChatOpenAI(model="o1-preview", disable_streaming=True)

使用 Python < 3.11 进行异步操作

在 Python 版本 < 3.11 中，asyncio 任务不支持 context 参数。这限制了 LangGraph 自动传播上下文的能力，并影响 LangGraph 的流机制的两个关键方面：

您必须显式地将RunnableConfig传递给异步LLM调用（例如，ainvoke()），因为回调不会自动传播。
您不能在异步节点或工具中使用get_stream_writer——您必须直接传递一个writer参数。

Extended example: async LLM call with manual config

from typing import TypedDict
from langgraph.graph import START, StateGraph
from langchain.chat_models import init_chat_model

model = init_chat_model(model="openai:gpt-4o-mini")

class State(TypedDict):
    topic: str
    joke: str

# Accept config as an argument in the async node function
async def call_model(state, config):
    topic = state["topic"]
    print("Generating joke...")
    # Pass config to model.ainvoke() to ensure proper context propagation
    joke_response = await model.ainvoke(  
        [{"role": "user", "content": f"Write a joke about {topic}"}],
        config,
    )
    return {"joke": joke_response.content}

graph = (
    StateGraph(State)
    .add_node(call_model)
    .add_edge(START, "call_model")
    .compile()
)

# Set stream_mode="messages" to stream LLM tokens
async for chunk, metadata in graph.astream(
    {"topic": "ice cream"},
    stream_mode="messages",  
):
    if chunk.content:
        print(chunk.content, end="|", flush=True)

Extended example: async custom streaming with stream writer

from typing import TypedDict
from langgraph.types import StreamWriter

class State(TypedDict):
      topic: str
      joke: str

# Add writer as an argument in the function signature of the async node or tool
# LangGraph will automatically pass the stream writer to the function
async def generate_joke(state: State, writer: StreamWriter):  
      writer({"custom_key": "Streaming custom data while generating a joke"})
      return {"joke": f"This is a joke about {state['topic']}"}

graph = (
      StateGraph(State)
      .add_node(generate_joke)
      .add_edge(START, "generate_joke")
      .compile()
)

# Set stream_mode="custom" to receive the custom data in the stream  #
async for chunk in graph.astream(
      {"topic": "ice cream"},
      stream_mode="custom",
):
      print(chunk)

在GitHub上编辑此页面的源代码。

通过MCP将这些文档以编程方式连接到Claude、VSCode等，以获取实时答案。

LangGraph v1.0

开始使用

功能特性

生产环境

LangGraph APIs

支持的流模式

基本用法示例

流式传输多种模式

流图状态

流子图输出

调试

大型语言模型令牌

通过LLM调用进行筛选

按节点筛选

流式自定义数据

与任何大型语言模型（LLM）一起使用

禁用特定聊天模型的流式传输

使用 Python < 3.11 进行异步操作

LangGraph v1.0

开始使用

功能特性

生产环境

LangGraph APIs

​支持的流模式

​基本用法示例

​流式传输多种模式

​流图状态

​流子图输出

​调试

​大型语言模型令牌

​通过LLM调用进行筛选

​按节点筛选

​流式自定义数据

​与任何大型语言模型（LLM）一起使用

​禁用特定聊天模型的流式传输

​使用 Python < 3.11 进行异步操作

支持的流模式

基本用法示例

流式传输多种模式

流图状态

流子图输出

调试

大型语言模型令牌

通过LLM调用进行筛选

按节点筛选

流式自定义数据

与任何大型语言模型（LLM）一起使用

禁用特定聊天模型的流式传输

使用 Python < 3.11 进行异步操作