模型 - Docs by LangChain

大语言模型是强大的 AI 工具，能够像人类一样理解和生成文本。它们功能多样，足以撰写内容、翻译语言、总结归纳和回答问题，而无需为每项任务进行专门训练。除了文本生成，许多模型还支持：

工具调用 - 调用外部工具（例如数据库查询或 API 调用）并在其响应中使用结果。
结构化输出 - 模型的响应被限制为遵循预定义的格式。
多模态 - 处理并返回文本以外的数据，例如图像、音频和视频。
推理 - 模型执行多步推理以得出结论。

模型是智能体的推理引擎。它们驱动智能体的决策过程，决定调用哪些工具、如何解读结果以及何时提供最终答案。您所选模型的质量与能力直接影响您的智能体的可靠性和性能。不同的模型擅长不同的任务——有些更擅长遵循复杂指令，有些则擅长结构化推理，而有些则支持更大的上下文窗口来处理更多信息。 LangChain 的标准模型接口支持多种不同的提供商集成，让您可以轻松试用和切换模型，从而找到最适合您用例的模型。

有关特定提供商的集成信息和功能，请参阅提供商的集成页面。

基本用法

模型可以通过两种方式使用：

使用智能体 - 在创建智能体时可以动态指定模型。
独立使用 - 可以直接调用模型（在智能体循环之外）来执行文本生成、分类或提取等任务，而无需使用智能体框架。

相同的模型接口在两种场景下均适用，这为您提供了从简单入手、按需扩展至更复杂的基于智能体的工作流的灵活性。

初始化模型

在 LangChain 中开始使用独立模型的最简单方式是使用 init_chat_model 从您选择的提供商初始化一个（示例如下）：

OpenAI
Anthropic
Azure
Google Gemini
AWS Bedrock

👉 Read the OpenAI chat model integration docs

pip install -U "langchain[openai]"

import os
from langchain.chat_models import init_chat_model

os.environ["OPENAI_API_KEY"] = "sk-..."

model = init_chat_model("openai:gpt-4.1")

👉 Read the Anthropic chat model integration docs

pip install -U "langchain[anthropic]"

import os
from langchain.chat_models import init_chat_model

os.environ["ANTHROPIC_API_KEY"] = "sk-..."

model = init_chat_model("anthropic:claude-sonnet-4-5")

👉 Read the Azure chat model integration docs

pip install -U "langchain[openai]"

import os
from langchain.chat_models import init_chat_model

os.environ["AZURE_OPENAI_API_KEY"] = "..."
os.environ["AZURE_OPENAI_ENDPOINT"] = "..."
os.environ["OPENAI_API_VERSION"] = "2025-03-01-preview"

model = init_chat_model(
    "azure_openai:gpt-4.1",
    azure_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
)

👉 Read the Google GenAI chat model integration docs

pip install -U "langchain[google-genai]"

import os
from langchain.chat_models import init_chat_model

os.environ["GOOGLE_API_KEY"] = "..."

model = init_chat_model("google_genai:gemini-2.5-flash-lite")

👉 Read the AWS Bedrock chat model integration docs

pip install -U "langchain[aws]"

from langchain.chat_models import init_chat_model

# Follow the steps here to configure your credentials:
# https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started.html

model = init_chat_model(
    "anthropic.claude-3-5-sonnet-20240620-v1:0",
    model_provider="bedrock_converse",
)

response = model.invoke("Why do parrots talk?")

请参阅 init_chat_model 获取更多详细信息，包括如何传递模型参数。

关键方法

Invoke

模型以消息为输入，生成完整响应后输出消息。

Stream

调用模型，并在生成过程中实时流式输出。

Batch

以批处理方式向模型发送多个请求，以实现更高效的处理。

除了聊天模型，LangChain 还支持其他相关技术，例如嵌入模型和向量存储。详情请参阅集成页面。

参数

聊天模型接收可用于配置其行为的参数。所有支持的参数因模型和提供商而异，但标准参数包括：

model

string

required

您想要与提供商一起使用的特定模型的名称或标识符。

api_key

string

用于与模型提供商进行身份验证所需的密钥。这通常在您注册访问模型时颁发。通常通过设置来访问。

temperature

number

控制模型输出的随机性。数值越高，回复越具创造性；数值越低，则使其更具确定性。

timeout

number

The maximum time (in seconds) to wait for a response from the model before canceling the request.

max_tokens

number

限制响应中的总数，从而有效地控制输出的长度。

max_retries

number

当请求因网络超时或速率限制等问题而失败时，系统重新发送请求的最大尝试次数。

使用 init_chat_model，将这些参数作为内联：

Initialize using model parameters

model = init_chat_model(
    "anthropic:claude-sonnet-4-5",
    # Kwargs passed to the model:
    temperature=0.7,
    timeout=30,
    max_tokens=1000,
)

每个聊天模型集成可能有一些额外的参数，用于控制提供商特有的功能。例如，ChatOpenAI 有 use_responses_api 参数，用于指定是使用 OpenAI Responses API 还是 Completions API。要查找给定聊天模型支持的所有参数，请前往聊天模型集成页面。

我注意到您还没有提供需要翻译的文本。请提供您希望我翻译成简体中文的英文技术文档内容，我会按照您提供的所有规则进行翻译。

调用

需要调用聊天模型来生成输出。有三种主要的调用方法，各自适用于不同的用例。

调用

调用模型最直接的方式是使用 invoke() 配合单个消息或消息列表。

Single message

response = model.invoke("Why do parrots have colorful feathers?")
print(response)

可以将消息列表提供给模型来表示对话历史。每条消息都有一个角色，模型使用该角色来指示消息在对话中的发送者。请参阅消息指南，以获取有关角色、类型和内容的更多详情。

Dictionary format

from langchain.messages import HumanMessage, AIMessage, SystemMessage

conversation = [
    {"role": "system", "content": "You are a helpful assistant that translates English to French."},
    {"role": "user", "content": "Translate: I love programming."},
    {"role": "assistant", "content": "J'adore la programmation."},
    {"role": "user", "content": "Translate: I love building applications."}
]

response = model.invoke(conversation)
print(response)  # AIMessage("J'adore créer des applications.")

Message objects

from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

conversation = [
    SystemMessage("You are a helpful assistant that translates English to French."),
    HumanMessage("Translate: I love programming."),
    AIMessage("J'adore la programmation."),
    HumanMessage("Translate: I love building applications.")
]

response = model.invoke(conversation)
print(response)  # AIMessage("J'adore créer des applications.")

流

大多数模型可以在生成时流式传输其输出内容。通过逐步显示输出，流式传输能显著改善用户体验，尤其对于较长的响应。调用 stream() 会返回一个，它在生成时产生输出块。您可以使用循环实时处理每个块：

for chunk in model.stream("Why do parrots have colorful feathers?"):
    print(chunk.text, end="|", flush=True)

与 invoke() 不同，后者在模型完成生成其完整响应后返回单个 AIMessage，而 stream() 返回多个 AIMessageChunk 对象，每个对象包含一部分输出文本。重要的是，流中的每个块都设计为可以通过累加汇集为一条完整消息：

Construct an AIMessage

full = None  # None | AIMessageChunk
for chunk in model.stream("What color is the sky?"):
    full = chunk if full is None else full + chunk
    print(full.text)

# The
# The sky
# The sky is
# The sky is typically
# The sky is typically blue
# ...

print(full.content_blocks)
# [{"type": "text", "text": "The sky is typically blue..."}]

生成的消息可以像通过 invoke() 生成的消息一样进行处理 - 例如，可以将其聚合成消息历史，并作为对话上下文传回给模型。

流式传输仅在程序中的所有步骤都懂得如何处理分块流时才能工作。例如，不具备流式处理能力的应用程序，需要先将整个输出存储在内存中，然后才能进行处理。

Advanced streaming topics

"Auto-streaming" chat models

LangChain 通过在某些情况下自动启用流式模式来简化聊天模型的流式处理，即使您没有显式调用流式方法。当您使用非流式的 invoke 方法但仍希望流式处理整个应用程序（包括聊天模型的中间结果）时，这尤其有用。例如，在 LangGraph 智能体中，你可以在节点内调用 model.invoke()，但如果在流式模式下运行，LangChain 会自动委托给流式传输。

工作原理

当你 invoke() 聊天模型时，如果检测到你正在尝试对整个应用程序进行流式传输，LangChain 将自动切换到内部流式模式。对于使用 invoke 的代码而言，调用的结果将是相同的；然而，在聊天模型进行流式传输时，LangChain 将负责调用其回调系统中的 on_llm_new_token 事件。回调事件允许 LangGraph stream() 和 astream_events() 实时呈现聊天模型的输出。

Streaming events

LangChain 聊天模型也可以使用 astream_events() 来流式传输语义事件。这简化了基于事件类型和其他元数据的过滤，并将在后台聚合完整消息。请参见下面的示例。

async for event in model.astream_events("Hello"):

    if event["event"] == "on_chat_model_start":
        print(f"Input: {event['data']['input']}")

    elif event["event"] == "on_chat_model_stream":
        print(f"Token: {event['data']['chunk'].text}")

    elif event["event"] == "on_chat_model_end":
        print(f"Full message: {event['data']['output'].text}")

    else:
        pass

Input: Hello
Token: Hi
Token:  there
Token: !
Token:  How
Token:  can
Token:  I
...
Full message: Hi there! How can I help today?

请参阅 astream_events() 参考文档了解事件类型和其他详细信息。

批处理

将一批独立的请求发送给模型可以显著提升性能并降低成本，因为处理过程可以并行进行：

Batch

responses = model.batch([
    "Why do parrots have colorful feathers?",
    "How do airplanes fly?",
    "What is quantum computing?"
])
for response in responses:
    print(response)

本节介绍一个聊天模型方法 batch()，该方法可以在客户端并行化模型调用。它与推理提供商支持的批量 API 不同，例如 OpenAI 或 Anthropic。

默认情况下，batch() 只会返回整个批次的最终输出。如果您想在每个独立输入生成完成时接收其输出，可以使用 batch_as_completed() 来流式传输结果：

Yield batch responses upon completion

for response in model.batch_as_completed([
    "Why do parrots have colorful feathers?",
    "How do airplanes fly?",
    "What is quantum computing?"
]):
    print(response)

使用 batch_as_completed() 时，结果可能乱序到达。每个结果都包含输入索引，用于匹配并根据需要重建原始顺序。

当使用 batch() 或 batch_as_completed() 处理大量输入时，您可能希望控制最大并行调用数。这可以通过在 RunnableConfig 字典中设置 max_concurrency 属性来完成。

Batch with max concurrency

model.batch(
    list_of_inputs,
    config={
        'max_concurrency': 5,  # Limit to 5 parallel calls
    }
)

有关支持属性的完整列表，请参阅 RunnableConfig 参考。

有关批处理的更多详情，请参阅参考文档。

工具调用

模型可以请求调用工具，这些工具可以执行诸如从数据库获取数据、搜索网络或运行代码等任务。工具是以下内容的组合：

一个模式，包括工具的名称、描述和/或参数定义（通常是一个 JSON 模式）
一个用于执行的函数或。

您可能听说过“function calling”这个术语。我们将其与“tool calling”互换使用。

要让你定义的工具可供模型使用，你必须使用 bind_tools() 将其绑定。在后续调用中，模型可以根据需要选择调用任何已绑定的工具。一些模型提供商提供内置工具，这些工具可以通过模型或调用参数（例如 ChatOpenAI, ChatAnthropic）来启用。请查阅相应的提供商参考文档以了解详情。

请参阅工具指南了解详情以及创建工具的其他选项。

Binding user tools

from langchain.tools import tool

@tool
def get_weather(location: str) -> str:
    """Get the weather at a location."""
    return f"It's sunny in {location}."


model_with_tools = model.bind_tools([get_weather])  

response = model_with_tools.invoke("What's the weather like in Boston?")
for tool_call in response.tool_calls:
    # View tool calls made by the model
    print(f"Tool: {tool_call['name']}")
    print(f"Args: {tool_call['args']}")

在绑定用户自定义工具时，模型的响应会包含一个执行工具的请求。当您独立于智能体使用模型时，需要您来执行所请求的操作，并将结果返回给模型，供其在后续推理中使用。请注意，当使用智能体时，智能体循环会为您处理工具执行循环。下面，我们展示一些使用工具调用的常见方法。

Tool execution loop

当模型返回工具调用时，您需要执行这些工具并将结果传递回模型。这会创建一个对话循环，模型可以在其中使用工具结果来生成最终响应。LangChain 包含了智能体抽象，可以为您处理这种编排。这是一个如何做到这一点的简单示例：

Tool execution loop

# Bind (potentially multiple) tools to the model
model_with_tools = model.bind_tools([get_weather])

# Step 1: Model generates tool calls
messages = [{"role": "user", "content": "What's the weather in Boston?"}]
ai_msg = model_with_tools.invoke(messages)
messages.append(ai_msg)

# Step 2: Execute tools and collect results
for tool_call in ai_msg.tool_calls:
    # Execute the tool with the generated arguments
    tool_result = get_weather.invoke(tool_call)
    messages.append(tool_result)

# Step 3: Pass results back to model for final response
final_response = model_with_tools.invoke(messages)
print(final_response.text)
# "The current weather in Boston is 72°F and sunny."

工具返回的每个 ToolMessage 都包含一个与原始工具调用相匹配的 tool_call_id，帮助模型将结果与请求关联起来。

Forcing tool calls

默认情况下，模型可以根据用户输入自由选择使用哪个绑定的工具。然而，你可能想要强制选择一个工具，确保模型使用一个特定的工具或给定列表中的任何工具：

model_with_tools = model.bind_tools([tool_1], tool_choice="any")

Parallel tool calls

许多模型在适当时支持并行调用多个工具。这使得模型能够同时从不同来源收集信息。

Parallel tool calls

model_with_tools = model.bind_tools([get_weather])

response = model_with_tools.invoke(
    "What's the weather in Boston and Tokyo?"
)


# The model may generate multiple tool calls
print(response.tool_calls)
# [
#   {'name': 'get_weather', 'args': {'location': 'Boston'}, 'id': 'call_1'},
#   {'name': 'get_weather', 'args': {'location': 'Tokyo'}, 'id': 'call_2'},
# ]


# Execute all tools (can be done in parallel with async)
results = []
for tool_call in response.tool_calls:
    if tool_call['name'] == 'get_weather':
        result = get_weather.invoke(tool_call)
    ...
    results.append(result)

模型会根据所请求操作的独立性，智能地判断何时适合进行并行执行。

大多数支持工具调用的模型默认启用并行工具调用。一些模型（包括 OpenAI 和 Anthropic）允许您禁用此功能。要执行此操作，请设置 parallel_tool_calls=False：

model.bind_tools([get_weather], parallel_tool_calls=False)

Streaming tool calls

在流式响应时，工具调用通过 ToolCallChunk 逐步构建。这允许您在工具调用生成过程中查看它们，而无需等待完整响应。

Streaming tool calls

for chunk in model_with_tools.stream(
    "What's the weather in Boston and Tokyo?"
):
    # Tool call chunks arrive progressively
    for tool_chunk in chunk.tool_call_chunks:
        if name := tool_chunk.get("name"):
            print(f"Tool: {name}")
        if id_ := tool_chunk.get("id"):
            print(f"ID: {id_}")
        if args := tool_chunk.get("args"):
            print(f"Args: {args}")

# Output:
# Tool: get_weather
# ID: call_SvMlU1TVIZugrFLckFE2ceRE
# Args: {"lo
# Args: catio
# Args: n": "B
# Args: osto
# Args: n"}
# Tool: get_weather
# ID: call_QMZdy6qInx13oWKE7KhuhOLR
# Args: {"lo
# Args: catio
# Args: n": "T
# Args: okyo
# Args: "}

你可以累积数据块来构建完整的工具调用：

Accumulate tool calls

gathered = None
for chunk in model_with_tools.stream("What's the weather in Boston?"):
    gathered = chunk if gathered is None else gathered + chunk
    print(gathered.tool_calls)

结构化输出

可以要求模型按照给定模式提供响应。这有助于确保输出易于解析并用于后续处理。LangChain 支持多种模式类型和强制结构化输出的方法。

Pydantic
TypedDict
JSON Schema

Pydantic 模型提供了最丰富的功能集，包括字段验证、描述和嵌套结构。

        from pydantic import BaseModel, Field

        class Movie(BaseModel):
            """A movie with details."""
            title: str = Field(..., description="The title of the movie")
            year: int = Field(..., description="The year the movie was released")
            director: str = Field(..., description="The director of the movie")
            rating: float = Field(..., description="The movie's rating out of 10")

        model_with_structure = model.with_structured_output(Movie)
        response = model_with_structure.invoke("Provide details about the movie Inception")
        print(response)  # Movie(title="Inception", year=2010, director="Christopher Nolan", rating=8.8)

TypedDict 使用 Python 内置的类型系统提供了一个更简单的替代方案，在您不需要运行时验证时非常理想。

        from typing_extensions import TypedDict, Annotated

        class MovieDict(TypedDict):
            """A movie with details."""
            title: Annotated[str, ..., "The title of the movie"]
            year: Annotated[int, ..., "The year the movie was released"]
            director: Annotated[str, ..., "The director of the movie"]
            rating: Annotated[float, ..., "The movie's rating out of 10"]

        model_with_structure = model.with_structured_output(MovieDict)
        response = model_with_structure.invoke("Provide details about the movie Inception")
        print(response)  # {'title': 'Inception', 'year': 2010, 'director': 'Christopher Nolan', 'rating': 8.8}

为了实现最大程度的控制或互操作性，您可以提供原始的 JSON Schema。

import json

json_schema = {
    "title": "Movie",
    "description": "A movie with details",
    "type": "object",
    "properties": {
        "title": {
            "type": "string",
            "description": "The title of the movie"
        },
        "year": {
            "type": "integer",
            "description": "The year the movie was released"
        },
        "director": {
            "type": "string",
            "description": "The director of the movie"
        },
        "rating": {
            "type": "number",
            "description": "The movie's rating out of 10"
        }
    },
    "required": ["title", "year", "director", "rating"]
}

model_with_structure = model.with_structured_output(
    json_schema,
    method="json_schema",
)
response = model_with_structure.invoke("Provide details about the movie Inception")
print(response)  # {'title': 'Inception', 'year': 2010, ...}

结构化输出的关键考虑因素：

方法参数：一些提供商支持不同的方法 ('json_schema', 'function_calling', 'json_mode')
- 'json_schema' 通常指由提供商提供的专用结构化输出功能
- 'function_calling' 通过强制执行遵循给定模式的 tool call 来派生结构化输出
- 'json_mode' 是一些提供商提供的 'json_schema' 的前身 - 它会生成有效的 json，但模式必须在提示词中描述
包含原始输出：使用 include_raw=True 来同时获取解析后的输出和原始 AI 消息
验证：Pydantic 模型提供自动验证，而 TypedDict 和 JSON Schema 则需要手动验证

Example: Message output alongside parsed structure

返回原始的 AIMessage 对象以及解析后的表示，以便访问响应元数据（例如 token counts）可能会很有用。为此，在调用 with_structured_output 时设置 [include_raw=True](https://reference.langchain.com/python/langchain_core/language_models/#langchain_core.language_models.chat_models.BaseChatModel.with_structured_output(include_raw)：

from pydantic import BaseModel, Field

class Movie(BaseModel):
    """A movie with details."""
    title: str = Field(..., description="The title of the movie")
    year: int = Field(..., description="The year the movie was released")
    director: str = Field(..., description="The director of the movie")
    rating: float = Field(..., description="The movie's rating out of 10")

model_with_structure = model.with_structured_output(Movie, include_raw=True)  
response = model_with_structure.invoke("Provide details about the movie Inception")
response
# {
#     "raw": AIMessage(...),
#     "parsed": Movie(title=..., year=..., ...),
#     "parsing_error": None,
# }

Example: Nested structures

模式可以嵌套：

from pydantic import BaseModel, Field

class Actor(BaseModel):
    name: str
    role: str

class MovieDetails(BaseModel):
    title: str
    year: int
    cast: list[Actor]
    genres: list[str]
    budget: float | None = Field(None, description="Budget in millions USD")

model_with_structure = model.with_structured_output(MovieDetails)

支持的模型

LangChain 支持所有主要的模型提供商，包括 OpenAI、Anthropic、Google、Azure、AWS Bedrock 等。每个提供商都提供具有不同功能的多种模型。有关 LangChain 中支持的模型的完整列表，请参阅集成页面。

高级主题

多模态

某些模型可以处理和返回非文本数据，例如图像、音频和视频。您可以通过提供内容块将非文本数据传递给模型。

所有具有底层多模态能力的 LangChain 聊天模型都支持：

采用跨提供商标准格式的数据（参见我们的消息指南）
OpenAI 对话补全格式
该特定提供商的原生格式（例如，Anthropic 模型接受 Anthropic 原生格式）

请参阅消息指南的多模态部分了解详情。可以在其响应中返回多模态数据。如果被调用执行此操作，生成的 AIMessage 将包含多模态类型的内容块。

Multimodal output

response = model.invoke("Create a picture of a cat")
print(response.content_blocks)
# [
#     {"type": "text", "text": "Here's a picture of a cat"},
#     {"type": "image", "base64": "...", "mime_type": "image/jpeg"},
# ]

有关特定提供商的详细信息，请参阅集成页面。

推理

更新的模型能够进行多步推理以得出结论。这涉及到将复杂问题分解为更小、更易于管理的步骤。 **如果底层模型支持，**你可以展示这个推理过程，以便更好地理解模型是如何得出最终答案的。

for chunk in model.stream("Why do parrots have colorful feathers?"):
    reasoning_steps = [r for r in chunk.content_blocks if r["type"] == "reasoning"]
    print(reasoning_steps if reasoning_steps else chunk.text)

根据模型的不同，你有时可以指定它在推理上应该投入的努力程度。同样地，你可以要求模型完全关闭推理。这可能采用分类的推理”层级”形式（例如，'low' 或 'high'）或整数 token 预算。详情请参阅集成页面或您各自聊天模型的参考文档。

本地模型

LangChain 支持在您自己的硬件上本地运行模型。这在以下场景中很有用：数据隐私至关重要、您希望调用自定义模型，或者您希望避免使用云模型所产生的成本。 Ollama 是在本地运行模型的最简单方法之一。在集成页面上查看本地集成的完整列表。

提示缓存

许多提供商提供提示缓存功能，以减少在重复处理相同令牌时的延迟和成本。这些功能可以是隐式的或显式的：

**隐式提示缓存：**如果请求命中缓存，提供商将自动节省成本。例如：OpenAI 和 Gemini（Gemini 2.5 及更高版本）。
**显式缓存：**提供商允许您手动指定缓存点，以获得更好的控制或保证节省成本。例如：ChatOpenAI（通过 prompt_cache_key）、Anthropic 的 AnthropicPromptCachingMiddleware 和 cache_control 选项、AWS Bedrock、Gemini。

提示缓存通常只在超过最小输入 token 阈值时启用。详情请参阅提供商页面。

缓存使用将反映在模型响应的使用元数据中。

服务器端工具使用

一些提供商支持服务端 tool-calling 循环：模型可以在单次对话轮次中与网络搜索、代码解释器和其他工具交互，并分析结果。如果一个模型在服务器端调用工具，响应消息的内容将包含表示工具调用和结果的内容。访问响应的内容块将以与提供商无关的格式返回服务器端工具调用和结果：

Invoke with server-side tool use

from langchain.chat_models import init_chat_model

model = init_chat_model("openai:gpt-4.1-mini")

tool = {"type": "web_search"}
model_with_tools = model.bind_tools([tool])

response = model_with_tools.invoke("What was a positive news story from today?")
response.content_blocks

Result

[
    {
        "type": "server_tool_call",
        "name": "web_search",
        "args": {
            "query": "positive news stories today",
            "type": "search"
        },
        "id": "ws_abc123"
    },
    {
        "type": "server_tool_result",
        "tool_call_id": "ws_abc123",
        "status": "success"
    },
    {
        "type": "text",
        "text": "Here are some positive news stories from today...",
        "annotations": [
            {
                "end_index": 410,
                "start_index": 337,
                "title": "article title",
                "type": "citation",
                "url": "..."
            }
        ]
    }
]

这表示一个单独的对话轮次；没有需要像客户端工具调用那样传入的关联 ToolMessage 对象。请参阅集成页面，获取您提供商的可用工具和使用详情。

速率限制

许多聊天模型提供商会对在给定时间段内可进行的调用次数施加限制。如果您达到速率限制，通常会收到来自提供商的速率限制错误响应，并且需要等待一段时间后才能发出更多请求。为帮助管理速率限制，聊天模型集成接受一个 rate_limiter 参数，可在初始化时提供，用以控制请求速率。

Initialize and use a rate limiter

LangChain 带有（可选的）内置 InMemoryRateLimiter。该限制器是线程安全的，并且可以被同一进程中的多个线程共享。

Define a rate limiter

from langchain_core.rate_limiters import InMemoryRateLimiter

rate_limiter = InMemoryRateLimiter(
    requests_per_second=0.1,  # 1 request every 10s
    check_every_n_seconds=0.1,  # Check every 100ms whether allowed to make a request
    max_bucket_size=10,  # Controls the maximum burst size.
)

model = init_chat_model(
    model="gpt-5",
    model_provider="openai",
    rate_limiter=rate_limiter  
)

提供的速率限制器只能限制每单位时间的请求数量。如果您还需要根据请求的大小进行限制，它将无法提供帮助。

基础 URL 或代理

对于许多聊天模型集成，你可以配置 API 请求的基础 URL，这允许你使用具有 OpenAI 兼容 API 的模型提供商或使用代理服务器。

Base URL

许多模型提供商提供与 OpenAI 兼容的 API（例如 Together AI、vLLM）。您可以通过指定适当的 base_url 参数，将这些提供商与 init_chat_model 一起使用：

model = init_chat_model(
    model="MODEL_NAME",
    model_provider="openai",
    base_url="BASE_URL",
    api_key="YOUR_API_KEY",
)

当使用直接聊天模型类实例化时，参数名称可能因提供商而异。请查看相应的参考了解详情。

Proxy configuration

对于需要 HTTP 代理的部署，一些模型集成支持代理配置：

from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="gpt-4o",
    openai_proxy="http://proxy.example.com:8080"
)

代理支持因集成而异。请查看特定模型提供商的参考文档以了解代理配置选项。

对数概率

某些模型可以在初始化时通过设置 logprobs 参数，返回表示给定词元可能性的词元级别对数概率：

model = init_chat_model(
    model="gpt-4o",
    model_provider="openai"
).bind(logprobs=True)

response = model.invoke("Why do parrots talk?")
print(response.response_metadata["logprobs"])

Token 使用

许多模型提供商会在其调用响应中返回 token 使用信息。如果可用，此信息将被包含在由相应模型生成的 AIMessage 对象中。更多详情请参阅 messages 指南。

一些提供商 API，特别是 OpenAI 和 Azure OpenAI 的聊天补全接口，要求用户选择加入，才能在流式上下文中接收 token 使用数据。详情请参阅集成指南中的流式使用元数据部分。

您可以使用回调或上下文管理器来跟踪应用程序中跨模型的聚合令牌计数，如下所示：

Callback handler
Context manager

from langchain.chat_models import init_chat_model
from langchain_core.callbacks import UsageMetadataCallbackHandler

model_1 = init_chat_model(model="openai:gpt-4o-mini")
model_2 = init_chat_model(model="anthropic:claude-3-5-haiku-latest")

callback = UsageMetadataCallbackHandler()
result_1 = model_1.invoke("Hello", config={"callbacks": [callback]})
result_2 = model_2.invoke("Hello", config={"callbacks": [callback]})
callback.usage_metadata

{
    'gpt-4o-mini-2024-07-18': {
        'input_tokens': 8,
        'output_tokens': 10,
        'total_tokens': 18,
        'input_token_details': {'audio': 0, 'cache_read': 0},
        'output_token_details': {'audio': 0, 'reasoning': 0}
    },
    'claude-3-5-haiku-20241022': {
        'input_tokens': 8,
        'output_tokens': 21,
        'total_tokens': 29,
        'input_token_details': {'cache_read': 0, 'cache_creation': 0}
    }
}

from langchain.chat_models import init_chat_model
from langchain_core.callbacks import get_usage_metadata_callback

model_1 = init_chat_model(model="openai:gpt-4o-mini")
model_2 = init_chat_model(model="anthropic:claude-3-5-haiku-latest")

with get_usage_metadata_callback() as cb:
    model_1.invoke("Hello")
    model_2.invoke("Hello")
    print(cb.usage_metadata)

{
    'gpt-4o-mini-2024-07-18': {
        'input_tokens': 8,
        'output_tokens': 10,
        'total_tokens': 18,
        'input_token_details': {'audio': 0, 'cache_read': 0},
        'output_token_details': {'audio': 0, 'reasoning': 0}
    },
    'claude-3-5-haiku-20241022': {
        'input_tokens': 8,
        'output_tokens': 21,
        'total_tokens': 29,
        'input_token_details': {'cache_read': 0, 'cache_creation': 0}
    }
}

调用配置

在调用模型时，你可以通过 config 参数，使用一个 RunnableConfig 字典来传递附加配置。这提供了对执行行为、回调和元数据跟踪的运行时控制。常见的配置选项包括：

Invocation with config

response = model.invoke(
    "Tell me a joke",
    config={
        "run_name": "joke_generation",      # Custom name for this run
        "tags": ["humor", "demo"],          # Tags for categorization
        "metadata": {"user_id": "123"},     # Custom metadata
        "callbacks": [my_callback_handler], # Callback handlers
    }
)

这些配置值在以下情况下特别有用：

使用 LangSmith 追踪进行调试
实现自定义日志记录或监控
在生产环境中控制资源使用
跟踪复杂管道中的调用

Key configuration attributes

run_name

string

在日志和跟踪中标识此特定调用。子调用不会继承此设置。

可配置模型

您还可以通过指定 configurable_fields 来创建一个运行时可配置的模型。如果未指定模型值，则 'model' 和 'model_provider' 将默认可配置。

from langchain.chat_models import init_chat_model

configurable_model = init_chat_model(temperature=0)

configurable_model.invoke(
    "what's your name",
    config={"configurable": {"model": "gpt-5-nano"}},  # Run with GPT-5-Nano
)
configurable_model.invoke(
    "what's your name",
    config={"configurable": {"model": "claude-sonnet-4-5"}},  # Run with Claude
)

Configurable model with default values

我们可以创建一个带有默认模型值的可配置模型，指定哪些参数是可配置的，并为可配置参数添加前缀：

first_model = init_chat_model(
        model="gpt-4.1-mini",
        temperature=0,
        configurable_fields=("model", "model_provider", "temperature", "max_tokens"),
        config_prefix="first",  # Useful when you have a chain with multiple models
)

first_model.invoke("what's your name")

first_model.invoke(
    "what's your name",
    config={
        "configurable": {
            "first_model": "claude-sonnet-4-5",
            "first_temperature": 0.5,
            "first_max_tokens": 100,
        }
    },
)

Using a configurable model declaratively

我们可以在可配置模型上调用声明式操作，如 bind_tools、with_structured_output、with_configurable 等，并且可以像链式调用常规实例化的聊天模型对象一样，对可配置模型进行链式调用。

from pydantic import BaseModel, Field

class GetWeather(BaseModel):
    """Get the current weather in a given location"""

        location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

class GetPopulation(BaseModel):
    """Get the current population in a given location"""

        location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

model = init_chat_model(temperature=0)
model_with_tools = model.bind_tools([GetWeather, GetPopulation])

model_with_tools.invoke(
    "what's bigger in 2024 LA or NYC", config={"configurable": {"model": "gpt-4.1-mini"}}
).tool_calls

[
    {
        'name': 'GetPopulation',
        'args': {'location': 'Los Angeles, CA'},
        'id': 'call_Ga9m8FAArIyEjItHmztPYA22',
        'type': 'tool_call'
    },
    {
        'name': 'GetPopulation',
        'args': {'location': 'New York, NY'},
        'id': 'call_jh2dEvBaAHRaw5JUDthOs7rt',
        'type': 'tool_call'
    }
]

model_with_tools.invoke(
    "what's bigger in 2024 LA or NYC",
        config={"configurable": {"model": "claude-sonnet-4-5"}},
).tool_calls

[
    {
        'name': 'GetPopulation',
        'args': {'location': 'Los Angeles, CA'},
        'id': 'toolu_01JMufPf4F4t2zLj7miFeqXp',
        'type': 'tool_call'
    },
    {
        'name': 'GetPopulation',
        'args': {'location': 'New York City, NY'},
        'id': 'toolu_01RQBHcE8kEEbYTuuS8WqY1u',
        'type': 'tool_call'
    }
]

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

LangChain v1.0

开始使用

核心组件

高级用法

生产环境使用

模型

基本用法

初始化模型

关键方法

Invoke

Stream

Batch

参数

调用

调用

流

工作原理

批处理

工具调用

结构化输出

支持的模型

高级主题

多模态

推理

本地模型

提示缓存

服务器端工具使用

速率限制

基础 URL 或代理

对数概率

Token 使用

调用配置

可配置模型

LangChain v1.0

开始使用

核心组件

高级用法

生产环境使用

​基本用法

​初始化模型

​关键方法

Invoke

Stream

Batch

​参数

​调用

​调用

​流

​工作原理

​批处理

​工具调用

​结构化输出

​支持的模型

​高级主题

​多模态

​推理

​本地模型

​提示缓存

​服务器端工具使用

​速率限制

​基础 URL 或代理

​对数概率

​Token 使用

​调用配置

​可配置模型

基本用法

初始化模型

关键方法

参数

调用

调用

流

工作原理

批处理

工具调用

结构化输出

支持的模型

高级主题

多模态

推理

本地模型

提示缓存

服务器端工具使用

速率限制

基础 URL 或代理

对数概率

Token 使用

调用配置

可配置模型