LLMs 是强大的 AI 工具，能够像人类一样解释和生成文本。它们功能多样，可以编写内容、翻译语言、总结和回答问题，而无需为每个任务进行专门训练。除了文本生成，许多模型还支持：

工具调用 - 调用外部工具（如数据库查询或 API 调用）并在其响应中使用结果。
结构化输出 - 模型的响应被约束为遵循预定义的格式。
多模态 - 处理并返回文本以外的数据，例如图像、音频和视频。
推理 - 模型执行多步推理以得出结论。

模型是智能体的推理引擎。它们驱动智能体的决策过程，决定调用哪些工具、如何解读结果，以及何时提供最终答案。您选择的模型的质量和能力直接影响您智能体的可靠性和性能。不同的模型擅长不同的任务——有些更擅长遵循复杂指令，有些则擅长结构化推理，还有一些支持更大的上下文窗口以处理更多信息。 LangChain 的标准模型接口让您能够访问众多不同的提供商集成，这让尝试和切换模型变得简单，从而为您的用例找到最合适的模型。

有关特定于提供商的集成信息和功能，请参阅其集成页面。

基本用法

模型可以通过两种方式使用：

使用智能体 - 在创建智能体时，可以动态指定模型。
独立使用 - 无需智能体框架，可以直接调用模型（在智能体循环之外）来执行文本生成、分类或提取等任务。

同样的模型接口在两种情境下都适用，这为您提供了灵活性，让您可以从简单入手，并根据需要扩展到更复杂的基于智能体的工作流。

初始化模型

在 LangChain 中使用独立模型的最简单入门方式是使用 initChatModel 从您选择的提供商初始化一个（示例如下）：

OpenAI
Anthropic
Azure
Google Gemini
Bedrock Converse

👉 Read the OpenAI chat model integration docs

npm install @langchain/openai

import { initChatModel } from "langchain";

process.env.OPENAI_API_KEY = "your-api-key";

const model = await initChatModel("openai:gpt-4.1");

👉 Read the Anthropic chat model integration docs

npm install @langchain/anthropic

import { initChatModel } from "langchain";

process.env.ANTHROPIC_API_KEY = "your-api-key";

const model = await initChatModel("anthropic:claude-sonnet-4-5");

👉 Read the Azure chat model integration docs

npm install @langchain/azure

import { initChatModel } from "langchain";

process.env.AZURE_OPENAI_API_KEY = "your-api-key";
process.env.AZURE_OPENAI_ENDPOINT = "your-endpoint";
process.env.OPENAI_API_VERSION = "your-api-version";

const model = await initChatModel("azure_openai:gpt-4.1");

👉 Read the Google GenAI chat model integration docs

npm install @langchain/google-genai

import { initChatModel } from "langchain";

process.env.GOOGLE_API_KEY = "your-api-key";

const model = await initChatModel("google_genai:gemini-2.5-flash-lite");

👉 Read the AWS Bedrock chat model integration docs

npm install @langchain/aws

import { initChatModel } from "langchain";

// Follow the steps here to configure your credentials:
// https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started.html

const model = await initChatModel("bedrock:gpt-4.1");

const response = await model.invoke("Why do parrots talk?");

请参阅 initChatModel 了解更多详情，包括如何传递模型参数的信息。

关键方法

Invoke

模型接收消息作为输入，并在生成完整响应后输出消息。

Stream

调用模型，但在生成输出时实时流式传输。

Batch

以批处理方式向模型发送多个请求，以实现更高效的处理。

除了聊天模型之外，LangChain 还支持其他相关技术，例如嵌入模型和向量存储。有关详细信息，请参阅集成页面。

参数

聊天模型接受参数，这些参数可用于配置其行为。所支持的完整参数集因模型和提供商而异，但标准参数包括：

model

string

required

您要与提供商一起使用的特定模型的名称或标识符。

apiKey

string

用于向模型提供商进行身份验证的密钥。这通常在您注册以获取模型访问权限时颁发。通常通过设置一个来访问。

temperature

number

控制模型输出的随机性。数值越高，响应越有创意；数值越低，响应越具确定性。

timeout

number

在取消请求前，等待模型响应的最大时间（以秒为单位）。

maxTokens

number

限制响应中的总数，从而有效地控制输出的长度。

maxRetries

number

系统在因网络超时或速率限制等问题导致请求失败时，重新发送请求的最大尝试次数。

使用 initChatModel，将这些参数作为内联参数传递：

Initialize using model parameters

const model = await initChatModel(
    "anthropic:claude-sonnet-4-5",
    { temperature: 0.7, timeout: 30, max_tokens: 1000 }
)

每个聊天模型集成可能都有额外的参数，用于控制特定于提供商的功能。例如，@[ChatOpenAI] 有 use_responses_api 参数来决定是使用 OpenAI Responses API 还是 Completions API。要查找给定聊天模型支持的所有参数，请访问聊天模型集成页面。

LangGraph 智能体架构

LangGraph 智能体架构是一个强大、灵活的框架，用于构建有状态的、多角色的应用程序，这些应用程序由大型语言模型（LLM）驱动。它扩展了 LangChain 表达式语言，引入了一个关键的新抽象：有状态图。这种架构使开发者能够创建复杂的智能体工作流，其中智能体可以维护跨多个交互的上下文，与其他智能体协作，并动态地适应新信息。

核心概念

有状态图

LangGraph 的核心是有状态图。这个图定义了智能体应用程序的整体结构，包括节点（代表工作或逻辑）和边（代表转换逻辑）。图的状态在每次转换后更新，使智能体能够随时间推移维护上下文和记忆。

节点

节点是图中的基本计算单元。每个节点代表一个特定的函数或操作，例如：

调用 LLM
使用工具
处理用户输入
更新状态

边

边定义了图中节点之间的转换逻辑。LangGraph 支持几种类型的边：

普通边：定义一个固定的转换序列
条件边：根据当前状态决定下一个节点
工具调用边：处理工具调用和结果

状态

状态是图在执行过程中维护的共享记忆。它是一个 TypedDict，包含智能体需要跟踪的所有信息。状态在节点之间传递，并可用于影响未来的决策。

关键特性

持久化和检查点

LangGraph 内置了对持久化和检查点的支持。这允许你：

暂停和恢复智能体执行
实现人机交互工作流
为错误恢复创建时间旅行调试
在状态更新后保留记忆

并行执行

LangGraph 支持并行执行节点，这可以显著提高复杂工作流的性能。你可以配置图以并行运行多个节点，并在继续之前等待它们全部完成。

人机交互

该架构专为支持人机交互工作流而设计。你可以暂停智能体的执行以等待人工输入，然后无缝地恢复处理。

时间旅行

通过检查点系统，你可以“时间旅行”到图的任何先前状态。这对于调试、分析和理解智能体的决策过程非常有用。

与 LangChain 表达式语言的关系

LangGraph 建立在 LangChain 表达式语言（LCEL）之上，但扩展了其能力：

LCEL 非常适合创建简单的线性链
LangGraph 专为复杂、有状态和多智能体的应用程序而设计

你可以将 LangGraph 视为 LCEL 的超集，它增加了对循环、条件和持久化状态的支持。

使用场景

LangGraph 特别适用于：

多智能体系统：多个智能体协作解决复杂问题
长期运行的任务：需要随时间推移维护状态的任务
复杂的工作流：涉及条件逻辑和循环的工作流
交互式应用程序：需要人机交互的应用程序
研究和实验：需要深度分析和调试的复杂智能体行为

示例：简单的智能体

以下是一个使用 LangGraph 的简单智能体示例：

const response = await model.invoke("Why do parrots talk?");

这个例子创建了一个基本的智能体，它可以：

接收用户输入
决定是否需要使用工具
调用工具（如果需要）
提供最终响应

最佳实践

状态设计

保持状态最小化且专注
使用 TypedDict 进行类型安全
考虑哪些信息需要在节点之间共享

节点设计

保持节点小而专注
每个节点应该有一个单一、明确的职责
使用描述性名称以便于调试

错误处理

实现适当的错误处理和重试逻辑
使用检查点从错误中恢复
记录错误以便调试

测试

单独测试每个节点
测试整个图的工作流
使用时间旅行功能进行调试

结论

LangGraph 智能体架构为构建复杂的、有状态的 AI 应用程序提供了一个强大而灵活的框架。通过引入有状态图的概念，它使开发者能够创建能够随时间推移维护上下文、与其他智能体协作并动态适应新信息的智能体。无论是构建多智能体系统、长期运行的任务还是交互式应用程序，LangGraph 都提供了必要的工具和抽象来创建健壮且可扩展的解决方案。

调用

必须调用聊天模型来生成输出。有三种主要的调用方法，每种方法适用于不同的用例。

调用

调用模型最直接的方式是使用 invoke() 配合单个消息或消息列表。

Single message

const response = await model.invoke("Why do parrots have colorful feathers?");
console.log(response);

可以向模型提供消息列表来表示对话历史。每条消息都有一个角色，模型用它来表明对话中的消息发送者。有关角色、类型和内容的更多详情，请参阅 messages 指南。

Object format

const conversation = [
  { role: "system", content: "You are a helpful assistant that translates English to French." },
  { role: "user", content: "Translate: I love programming." },
  { role: "assistant", content: "J'adore la programmation." },
  { role: "user", content: "Translate: I love building applications." },
];

const response = await model.invoke(conversation);
console.log(response);  // AIMessage("J'adore créer des applications.")

Message objects

import { HumanMessage, AIMessage, SystemMessage } from "langchain";

const conversation = [
  new SystemMessage("You are a helpful assistant that translates English to French."),
  new HumanMessage("Translate: I love programming."),
  new AIMessage("J'adore la programmation."),
  new HumanMessage("Translate: I love building applications."),
];

const response = await model.invoke(conversation);
console.log(response);  // AIMessage("J'adore créer des applications.")

流

大多数模型可以在生成内容时进行流式输出。通过逐步显示输出，流式传输显著改善了用户体验，特别是对于较长的响应。调用 stream() 会返回一个，它在生成输出块时逐步产生。您可以使用循环实时处理每个输出块：

const stream = await model.stream("Why do parrots have colorful feathers?");
for await (const chunk of stream) {
  console.log(chunk.text)
}

与 invoke() 不同，后者在模型完成生成完整响应后返回一个单一的 AIMessage，stream() 则返回多个 AIMessageChunk 对象，每个对象包含一部分输出文本。重要的是，流中的每个块都旨在通过求和聚合成一条完整的消息：

Construct AIMessage

let full: AIMessageChunk | null = null;
for await (const chunk of stream) {
  full = full ? full.concat(chunk) : chunk;
  console.log(full.text);
}

// The
// The sky
// The sky is
// The sky is typically
// The sky is typically blue
// ...

console.log(full.contentBlocks);
// [{"type": "text", "text": "The sky is typically blue..."}]

生成的消息可以像使用 invoke() 生成的消息一样处理 - 例如，它可以被聚合成消息历史，并作为对话上下文传回给模型。

流式传输要正常工作，前提是程序中的所有步骤都必须知道如何处理数据块流。例如，一个不具备流式处理能力的应用程序，就需要先将整个输出存储在内存中，然后才能进行处理。

Advanced streaming topics

"Auto-streaming" chat models

LangChain通过在某些情况下自动启用流式模式，简化了从聊天模型进行流式传输，即使您没有显式调用流式方法。当您使用非流式invoke方法但仍希望流式传输整个应用程序（包括聊天模型的中间结果）时，这特别有用。例如，在 LangGraph 智能体中，你可以在节点中调用 model.invoke()，但如果在流式模式下运行，LangChain 会自动委托给流式传输。

工作原理

当你 invoke() 一个聊天模型时，如果 LangChain 检测到你正在尝试流式处理整个应用程序，它将自动切换到内部流式模式。就使用 invoke 的代码而言，调用的结果将是相同的；然而，在流式传输聊天模型时，LangChain 将负责在 LangChain 的回调系统中调用 @on_llm_new_token 事件。回调事件允许 LangGraph stream() 和 streamEvents() 实时呈现聊天模型的输出。

Streaming events

LangChain 聊天模型也可以使用 [streamEvents()][BaseChatModel.streamEvents] 流式传输语义事件。这简化了基于事件类型和其他元数据的过滤，并将在后台聚合完整消息。请参阅下方的示例。

const stream = await model.streamEvents("Hello");
for await (const event of stream) {
    if (event.event === "on_chat_model_start") {
        console.log(`Input: ${event.data.input}`);
    }
    if (event.event === "on_chat_model_stream") {
        console.log(`Token: ${event.data.chunk.text}`);
    }
    if (event.event === "on_chat_model_end") {
        console.log(`Full message: ${event.data.output.text}`);
    }
}

Input: Hello
Token: Hi
Token:  there
Token: !
Token:  How
Token:  can
Token:  I
...
Full message: Hi there! How can I help today?

请参阅 streamEvents()，了解事件类型和其他详情。

批处理

批处理一组到模型的独立请求可以显著提高性能并降低成本，因为处理可以并行进行：

Batch

const responses = await model.batch([
  "Why do parrots have colorful feathers?",
  "How do airplanes fly?",
  "What is quantum computing?",
  "Why do parrots have colorful feathers?",
  "How do airplanes fly?",
  "What is quantum computing?",
]);
for (const response of responses) {
  console.log(response);
}

当使用 batch() 处理大量输入时，您可能想要控制并行调用的最大数量。这可以通过在 RunnableConfig 字典中设置 maxConcurrency 属性来实现。

Batch with max concurrency

model.batch(
  listOfInputs,
  {
    maxConcurrency: 5,  // Limit to 5 parallel calls
  }
)

有关支持属性的完整列表，请参阅 RunnableConfig 参考。

有关批处理的更多详细信息，请参阅参考文档。

工具调用

模型可以请求调用工具来执行各种任务，例如从数据库获取数据、搜索网络或运行代码。工具是…的组合：

一个模式，其中包含工具的名称、描述以及参数定义（通常为 JSON 模式）
一个用于执行的函数或

您可能会听到”函数调用”这个术语。我们将其与”工具调用”互换使用。

为了让你定义的工具可供模型使用，你必须使用 bindTools() 来绑定它们。在后续调用中，模型可以根据需要选择调用任何已绑定的工具。一些模型提供商提供可以通过模型或调用参数启用的内置工具（例如 ChatOpenAI、ChatAnthropic）。请查看相应的提供商参考了解详情。

请参阅工具指南以了解详情和创建工具的其他选项。

Binding user tools

import { tool } from "langchain";
import * as z from "zod";
import { ChatOpenAI } from "@langchain/openai";

const getWeather = tool(
  (input) => `It's sunny in ${input.location}.`,
  {
    name: "get_weather",
    description: "Get the weather at a location.",
    schema: z.object({
      location: z.string().describe("The location to get the weather for"),
    }),
  },
);

const model = new ChatOpenAI({ model: "gpt-4o" });
const modelWithTools = model.bindTools([getWeather]);  

const response = await modelWithTools.invoke("What's the weather like in Boston?");
const toolCalls = response.tool_calls || [];
for (const tool_call of toolCalls) {
  // View tool calls made by the model
  console.log(`Tool: ${tool_call.name}`);
  console.log(`Args: ${tool_call.args}`);
}

绑定用户自定义工具时，模型的响应包含一个执行工具的请求。当单独使用模型而非智能体时，需要由您来执行所请求的操作，并将结果返回给模型，供其在后续推理中使用。请注意，当使用智能体时，智能体循环将为您处理工具执行循环。下面，我们展示一些使用工具调用的常见方式。

Tool execution loop

当模型返回工具调用时，您需要执行这些工具并将结果传递回模型。这会创建一个对话循环，模型可以在其中使用工具结果来生成最终响应。LangChain 包含了智能体抽象，可以为您处理这种编排。以下是一个简单的操作示例：

Tool execution loop

// Bind (potentially multiple) tools to the model
const modelWithTools = model.bindTools([get_weather])

// Step 1: Model generates tool calls
const messages = [{"role": "user", "content": "What's the weather in Boston?"}]
const ai_msg = await modelWithTools.invoke(messages)
messages.push(ai_msg)

// Step 2: Execute tools and collect results
for (const tool_call of ai_msg.tool_calls) {
    // Execute the tool with the generated arguments
    const tool_result = await get_weather.invoke(tool_call)
    messages.push(tool_result)
}

// Step 3: Pass results back to model for final response
const final_response = await modelWithTools.invoke(messages)
console.log(final_response.text)
// "The current weather in Boston is 72°F and sunny."

工具返回的每个 @[ToolMessage] 都包含一个与原始工具调用相匹配的 tool_call_id，这有助于模型将结果与请求关联起来。

Forcing tool calls

默认情况下，模型可以根据用户的输入自由选择使用哪个绑定的工具。然而，您可能想要强制选择一个工具，以确保模型使用特定工具或给定列表中的任何工具：

const modelWithTools = model.bindTools([tool_1], { toolChoice: "any" })

Parallel tool calls

许多模型支持在适当时并行调用多个工具。这使得模型能够同时从不同来源收集信息。

Parallel tool calls

const modelWithTools = model.bind_tools([get_weather])

const response = await modelWithTools.invoke(
    "What's the weather in Boston and Tokyo?"
)


// The model may generate multiple tool calls
console.log(response.tool_calls)
// [
//   { name: 'get_weather', args: { location: 'Boston' }, id: 'call_1' },
//   { name: 'get_time', args: { location: 'Tokyo' }, id: 'call_2' }
// ]


// Execute all tools (can be done in parallel with async)
const results = []
for (const tool_call of response.tool_calls || []) {
    if (tool_call.name === 'get_weather') {
        const result = await get_weather.invoke(tool_call)
        results.push(result)
    }
}

模型基于请求操作的独立性，智能地判断何时适合并行执行。

大多数支持工具调用的模型默认启用并行工具调用。一些模型（包括 OpenAI 和 Anthropic）允许您禁用此功能。要执行此操作，请设置 parallel_tool_calls=False：

model.bind_tools([get_weather], parallel_tool_calls=False)

Streaming tool calls

在流式响应时，工具调用通过 @[ToolCallChunk] 逐步构建。这允许您在工具调用生成过程中查看它们，而无需等待完整响应。

Streaming tool calls

const stream = await modelWithTools.stream(
    "What's the weather in Boston and Tokyo?"
)
for await (const chunk of stream) {
    // Tool call chunks arrive progressively
    if (chunk.tool_call_chunks) {
        for (const tool_chunk of chunk.tool_call_chunks) {
        console.log(`Tool: ${tool_chunk.get('name', '')}`)
        console.log(`Args: ${tool_chunk.get('args', '')}`)
        }
    }
}

// Output:
// Tool: get_weather
// Args:
// Tool:
// Args: {"loc
// Tool:
// Args: ation": "BOS"}
// Tool: get_time
// Args:
// Tool:
// Args: {"timezone": "Tokyo"}

你可以累积数据块来构建完整的工具调用：

Accumulate tool calls

let full: AIMessageChunk | null = null
const stream = await modelWithTools.stream("What's the weather in Boston?")
for await (const chunk of stream) {
    full = full ? full.concat(chunk) : chunk
    console.log(full.contentBlocks)
}

结构化输出

可以请求模型以匹配给定模式的格式提供其响应。这有助于确保输出能够被轻松解析，并用于后续处理。LangChain 支持多种模式类型和用于强制执行结构化输出的方法。

Zod
JSON Schema

zod 模式是定义输出模式的首选方法。请注意，当提供了 zod 模式时，模型输出也将使用 zod 的 parse 方法根据该模式进行验证。

        import * as z from "zod";

        const Movie = z.object({
          title: z.string().describe("The title of the movie"),
          year: z.number().describe("The year the movie was released"),
          director: z.string().describe("The director of the movie"),
          rating: z.number().describe("The movie's rating out of 10"),
        });

        const modelWithStructure = model.withStructuredOutput(Movie);

        const response = await modelWithStructure.invoke("Provide details about the movie Inception");
        console.log(response);
        // {
        //   title: "Inception",
        //   year: 2010,
        //   director: "Christopher Nolan",
        //   rating: 8.8,
        // }

为了获得最大的控制能力或互操作性，您可以提供一个原始的 JSON Schema。

const jsonSchema = {
  "title": "Movie",
  "description": "A movie with details",
  "type": "object",
  "properties": {
    "title": {
      "type": "string",
      "description": "The title of the movie",
    },
    "year": {
      "type": "integer",
      "description": "The year the movie was released",
    },
    "director": {
      "type": "string",
      "description": "The director of the movie",
    },
    "rating": {
      "type": "number",
      "description": "The movie's rating out of 10",
    },
  },
  "required": ["title", "year", "director", "rating"],
}

const modelWithStructure = model.withStructuredOutput(
  jsonSchema,
  { method: "jsonSchema" },
)

const response = await modelWithStructure.invoke("Provide details about the movie Inception")
console.log(response)  // {'title': 'Inception', 'year': 2010, ...}

结构化输出的关键考虑因素：

方法参数：一些提供商支持不同的方法 ('jsonSchema', 'functionCalling', 'jsonMode')
- 包含原始数据：使用 @[includeRaw: true][BaseChatModel.with_structured_output(include_raw)] 以获取解析后的输出和原始的 AIMessage
- 验证：Zod 模型提供自动验证，而 JSON Schema 需要手动验证

Example: Message output alongside parsed structure

返回原始的 AIMessage 对象以及解析后的表示形式会很有用，这样可以访问响应元数据（例如 token counts）。为此，在调用 @[with_structured_output][BaseChatModel.with_structured_output] 时设置 @[include_raw=True][BaseChatModel.with_structured_output(include_raw)]：

import * as z from "zod";

const Movie = z.object({
  title: z.string().describe("The title of the movie"),
  year: z.number().describe("The year the movie was released"),
  director: z.string().describe("The director of the movie"),
  rating: z.number().describe("The movie's rating out of 10"),
  title: z.string().describe("The title of the movie"),
  year: z.number().describe("The year the movie was released"),
  director: z.string().describe("The director of the movie"),  
  rating: z.number().describe("The movie's rating out of 10"),
});

const modelWithStructure = model.withStructuredOutput(Movie, { includeRaw: true });

const response = await modelWithStructure.invoke("Provide details about the movie Inception");
console.log(response);
// {
//   raw: AIMessage { ... },
//   parsed: { title: "Inception", ... }
// }

Example: Nested structures

模式可以嵌套：

import * as z from "zod";

const Actor = z.object({
  name: str
  role: z.string(),
});

const MovieDetails = z.object({
  title: z.string(),
  year: z.number(),
  cast: z.array(Actor),
  genres: z.array(z.string()),
  budget: z.number().nullable().describe("Budget in millions USD"),
});

const modelWithStructure = model.withStructuredOutput(MovieDetails);

支持的模型

LangChain 支持所有主流模型提供商，包括 OpenAI、Anthropic、Google、Azure、AWS Bedrock 等。每个提供商都提供具有不同能力的多种模型。要查看 LangChain 支持的模型的完整列表，请参阅集成页面。

高级主题

多模态

某些模型可以处理并返回非文本数据，例如图像、音频和视频。您可以通过提供 content blocks 向模型传递非文本数据。

所有具有底层多模态能力的 LangChain 聊天模型支持：

跨提供商标准格式的数据（参见我们的消息指南）
OpenAI 聊天补全格式
该特定提供商的原生格式（例如，Anthropic 模型接受 Anthropic 原生格式）

请参阅消息指南的多模态部分了解详情。可以在其响应中返回多模态数据。如果被调用执行此操作，生成的 AIMessage 将包含具有多模态类型的内容块。

Multimodal output

const response = await model.invoke("Create a picture of a cat");
console.log(response.contentBlocks);
// [
//   { type: "text", text: "Here's a picture of a cat" },
//   { type: "image", data: "...", mimeType: "image/jpeg" },
// ]

请参阅集成页面以了解特定提供商的详细信息。

推理

新一代模型能够执行多步推理来得出结论。这涉及到将复杂问题分解成更小、更易于处理的步骤。 **如果底层模型支持，**您可以展示此推理过程，以便更好地理解模型是如何得出其最终答案的。

const stream = model.stream("Why do parrots have colorful feathers?");
for await (const chunk of stream) {
    const reasoningSteps = chunk.contentBlocks.filter(b => b.type === "reasoning");
    console.log(reasoningSteps.length > 0 ? reasoningSteps : chunk.text);
}

根据模型的不同，你有时可以指定它在推理上应该投入的努力程度。同样地，你也可以要求模型完全关闭推理。这可能采用分类的推理”层级”形式（例如，'low' 或 'high'）或整数 token 预算。详情请参阅您相应聊天模型的集成页面或参考文档。

本地模型

LangChain 支持在您自己的硬件上本地运行模型。这在以下场景中很有用：数据隐私至关重要、需要调用自定义模型，或希望避免使用云模型所产生的成本。 Ollama 是在本地运行模型的最简单方法之一。在集成页面上查看本地集成的完整列表。

提示缓存

许多提供商提供 prompt 缓存功能，以减少在重复处理相同 token 时的延迟和成本。这些功能可以是隐式的或显式的：

**隐式提示缓存：**如果请求命中缓存，提供商将自动传递成本节省。例如：OpenAI 和 Gemini (Gemini 2.5 及以上版本)。
**显式缓存：**提供商允许您手动指定缓存点，以实现更精细的控制或保证节省成本。例如：@[ChatOpenAI] (通过 prompt_cache_key)、Anthropic 的 AnthropicPromptCachingMiddleware 和 cache_control 选项、AWS Bedrock、Gemini。

提示缓存通常仅在超过最小输入令牌阈值时启用。详情请参阅提供商页面。

缓存使用将反映在模型响应的使用元数据中。

服务器端工具使用

一些提供商支持服务器端工具调用循环：模型可以在单轮对话中与网络搜索、代码解释器和其他工具交互，并分析结果。如果模型在服务器端调用工具，响应消息的内容将包含表示工具调用和结果的内容。访问响应的内容块将以与提供商无关的格式返回服务器端的工具调用和结果：

import { initChatModel } from "langchain";

const model = await initChatModel("openai:gpt-4.1-mini");
const modelWithTools = model.bindTools([{ type: "web_search" }])

const message = await modelWithTools.invoke("What was a positive news story from today?");
console.log(message.contentBlocks);

这代表一个单独的对话轮次；没有像客户端工具调用中那样需要传入的关联 ToolMessage 对象。请参阅您指定提供商的集成页面，了解可用的工具和使用详情。

基础 URL 或代理

对于许多聊天模型集成，您可以配置 API 请求的基础 URL，这使您能够使用提供 OpenAI 兼容 API 的模型提供商，或使用代理服务器。

Base URL

许多模型提供商提供 OpenAI 兼容的 API（例如，Together AI、vLLM）。您可以通过指定相应的 base_url 参数，来使用 initChatModel 配合这些提供商：

model = initChatModel(
    "MODEL_NAME",
    {
        modelProvider: "openai",
        baseUrl: "BASE_URL",
        apiKey: "YOUR_API_KEY",
    }
)

直接实例化聊天模型类时，参数名称可能因提供商而异。请查看相应的参考文档以了解详情。

对数概率

某些模型可以在初始化时通过设置 logprobs 参数，返回表示给定词元可能性的词元级别对数概率：

const model = new ChatOpenAI({
    model: "gpt-4o",
    logprobs: true,
});

const responseMessage = await model.invoke("Why do parrots talk?");

responseMessage.response_metadata.logprobs.content.slice(0, 5);

Token 使用

许多模型提供商在调用响应中返回 token 使用信息。如果可用，此信息将被包含在由相应模型生成的 AIMessage 对象中。有关更多详情，请参阅 messages 指南。

一些提供商 API，特别是 OpenAI 和 Azure OpenAI 的聊天补全功能，要求用户选择启用才能在流式上下文中接收 token 使用数据。有关详细信息，请参阅集成指南的流式使用元数据部分。

调用配置

在调用模型时，你可以通过 config 参数使用 RunnableConfig 对象来传递额外配置。这提供了对执行行为、回调和元数据跟踪的运行时控制。常见的配置选项包括：

Invocation with config

const response = await model.invoke(
    "Tell me a joke",
    {
        runName: "joke_generation",      // Custom name for this run
        tags: ["humor", "demo"],          // Tags for categorization
        metadata: {"user_id": "123"},     // Custom metadata
        callbacks: [my_callback_handler], // Callback handlers
    }
)

这些配置值在以下情况下特别有用：

使用 LangSmith 追踪进行调试
实现自定义日志记录或监控
控制生产环境中的资源使用
跨复杂流水线跟踪调用

Key configuration attributes

runName

string

在日志和追踪中标识此特定调用。子调用不会继承此标识。

LangChain v1.0

开始使用

核心组件

高级用法

生产环境使用

​基本用法

​初始化模型

​关键方法

Invoke

Stream

Batch

​参数

​LangGraph 智能体架构

​核心概念

​有状态图

​节点

​边

​状态

​关键特性

​持久化和检查点

​并行执行

​人机交互

​时间旅行

​与 LangChain 表达式语言的关系

​使用场景

​示例：简单的智能体

​最佳实践

​状态设计

​节点设计

​错误处理

​测试

​结论

​调用

​调用

​流

​工作原理

​批处理

​工具调用

​结构化输出

​支持的模型

​高级主题

​多模态

​推理

​本地模型

​提示缓存

​服务器端工具使用

​基础 URL 或代理

​对数概率

​Token 使用

​调用配置

基本用法

初始化模型

关键方法

参数

LangGraph 智能体架构

核心概念

有状态图

节点

边

状态

关键特性

持久化和检查点

并行执行

人机交互

时间旅行

与 LangChain 表达式语言的关系

使用场景

示例：简单的智能体

最佳实践

状态设计

节点设计

错误处理

测试

结论

调用

调用

流

工作原理

批处理

工具调用

结构化输出

支持的模型

高级主题

多模态

推理

本地模型

提示缓存

服务器端工具使用

基础 URL 或代理

对数概率

Token 使用

调用配置