Control and customize agent execution at every step
Middleware provides a way to more tightly control what happens inside the agent.The core agent loop involves calling a model, letting it choose tools to execute, and then finishing when it calls no more tools:
Middleware exposes hooks before and after each of those steps:
Static string or callable function for custom description
Important: Human-in-the-loop middleware requires a checkpointer to maintain state across interruptions.See the human-in-the-loop documentation for complete examples and integration patterns.
from langchain_anthropic import ChatAnthropicfrom langchain_anthropic.middleware import AnthropicPromptCachingMiddlewarefrom langchain.agents import create_agentLONG_PROMPT = """Please be a helpful assistant.<Lots more context ...>"""agent = create_agent( model=ChatAnthropic(model="claude-sonnet-4-latest"), system_prompt=LONG_PROMPT, middleware=[AnthropicPromptCachingMiddleware(ttl="5m")],)# cache storeagent.invoke({"messages": [HumanMessage("Hi, my name is Bob")]})# cache hit, system prompt is cachedagent.invoke({"messages": [HumanMessage("What's my name?")]})
Limit the number of model calls to prevent infinite loops or excessive costs.
Perfect for:
Preventing runaway agents from making too many API calls
Enforcing cost controls on production deployments
Testing agent behavior within specific call budgets
from langchain.agents import create_agentfrom langchain.agents.middleware import ModelCallLimitMiddlewareagent = create_agent( model="openai:gpt-4o", tools=[...], middleware=[ ModelCallLimitMiddleware( thread_limit=10, # Max 10 calls per thread (across runs) run_limit=5, # Max 5 calls per run (single invocation) exit_behavior="end", # Or "error" to raise exception ), ],)
Automatically fallback to alternative models when the primary model fails.
Perfect for:
Building resilient agents that handle model outages
Cost optimization by falling back to cheaper models
Provider redundancy across OpenAI, Anthropic, etc.
from langchain.agents import create_agentfrom langchain.agents.middleware import ModelFallbackMiddlewareagent = create_agent( model="openai:gpt-4o", # Primary model tools=[...], middleware=[ ModelFallbackMiddleware( "openai:gpt-4o-mini", # Try first on error "anthropic:claude-3-5-sonnet-20241022", # Then this ), ],)
Add todo list management capabilities for complex multi-step tasks.
This middleware automatically provides agents with a write_todos tool and system prompts to guide effective task planning.
from langchain.agents import create_agentfrom langchain.agents.middleware import TodoListMiddlewarefrom langchain.messages import HumanMessageagent = create_agent( model="openai:gpt-4o", tools=[...], middleware=[TodoListMiddleware()],)result = agent.invoke({"messages": [HumanMessage("Help me refactor my codebase")]})print(result["todos"]) # Array of todo items with status tracking
Use an LLM to intelligently select relevant tools before calling the main model.
Perfect for:
Agents with many tools (10+) where most aren’t relevant per query
Reducing token usage by filtering irrelevant tools
Improving model focus and accuracy
from langchain.agents import create_agentfrom langchain.agents.middleware import LLMToolSelectorMiddlewareagent = create_agent( model="openai:gpt-4o", tools=[tool1, tool2, tool3, tool4, tool5, ...], # Many tools middleware=[ LLMToolSelectorMiddleware( model="openai:gpt-4o-mini", # Use cheaper model for selection max_tools=3, # Limit to 3 most relevant tools always_include=["search"], # Always include certain tools ), ],)
Emulate tool execution using an LLM for testing purposes, replacing actual tool calls with AI-generated responses.
Perfect for:
Testing agent behavior without executing real tools
Developing agents when external tools are unavailable or expensive
Prototyping agent workflows before implementing actual tools
from langchain.agents import create_agentfrom langchain.agents.middleware import LLMToolEmulatoragent = create_agent( model="openai:gpt-4o", tools=[get_weather, search_database, send_email], middleware=[ # Emulate all tools by default LLMToolEmulator(), # Or emulate specific tools # LLMToolEmulator(tools=["get_weather", "search_database"]), # Or use a custom model for emulation # LLMToolEmulator(model="anthropic:claude-3-5-sonnet-latest"), ],)
To exit early from middleware, return a dictionary with jump_to:
class EarlyExitMiddleware(AgentMiddleware): def before_model(self, state: AgentState, runtime) -> dict[str, Any] | None: # Check some condition if should_exit(state): return { "messages": [AIMessage("Exiting early due to condition.")], "jump_to": "end" } return None
Available jump targets:
"end": Jump to the end of the agent execution
"tools": Jump to the tools node
"model": Jump to the model node (or the first before_model hook)
Important: When jumping from before_model or after_model, jumping to "model" will cause all before_model middleware to run again.To enable jumping, decorate your hook with @hook_config(can_jump_to=[...]):
Select relevant tools at runtime to improve performance and accuracy.
Benefits:
Shorter prompts - Reduce complexity by exposing only relevant tools
Better accuracy - Models choose correctly from fewer options
Permission control - Dynamically filter tools based on user access
from langchain.agents import create_agentfrom langchain.agents.middleware import AgentMiddleware, ModelRequestfrom typing import Callableclass ToolSelectorMiddleware(AgentMiddleware): def wrap_model_call( self, request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse], ) -> ModelResponse: """Middleware to select relevant tools based on state/context.""" # Select a small, relevant subset of tools based on state/context relevant_tools = select_relevant_tools(request.state, request.runtime) request.tools = relevant_tools return handler(request)agent = create_agent( model="openai:gpt-4o", tools=all_tools, # All available tools need to be registered upfront # Middleware can be used to select a smaller subset that's relevant for the given run. middleware=[ToolSelectorMiddleware()],)
Show Extended example: GitHub vs GitLab tool selection
from dataclasses import dataclassfrom typing import Literal, Callablefrom langchain.agents import create_agentfrom langchain.agents.middleware import AgentMiddleware, ModelRequest, ModelResponsefrom langchain_core.tools import tool@tooldef github_create_issue(repo: str, title: str) -> dict: """Create an issue in a GitHub repository.""" return {"url": f"https://github.com/{repo}/issues/1", "title": title}@tooldef gitlab_create_issue(project: str, title: str) -> dict: """Create an issue in a GitLab project.""" return {"url": f"https://gitlab.com/{project}/-/issues/1", "title": title}all_tools = [github_create_issue, gitlab_create_issue]@dataclassclass Context: provider: Literal["github", "gitlab"]class ToolSelectorMiddleware(AgentMiddleware): def wrap_model_call( self, request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse], ) -> ModelResponse: """Select tools based on the VCS provider.""" provider = request.runtime.context.provider if provider == "gitlab": selected_tools = [t for t in request.tools if t.name == "gitlab_create_issue"] else: selected_tools = [t for t in request.tools if t.name == "github_create_issue"] request.tools = selected_tools return handler(request)agent = create_agent( model="openai:gpt-4o", tools=all_tools, middleware=[ToolSelectorMiddleware()], context_schema=Context,)# Invoke with GitHub contextagent.invoke( { "messages": [{"role": "user", "content": "Open an issue titled 'Bug: where are the cats' in the repository `its-a-cats-game`"}] }, context=Context(provider="github"),)
Key points:
Register all tools upfront
Middleware selects the relevant subset per request