LimitWarnerProcessor¶
The LimitWarnerProcessor is a standalone history processor that injects warning
instructions into the next model request as configured limits approach.
It is useful when you want the agent to finish efficiently before:
- request count reaches its maximum
- current message history gets too close to the context window
- cumulative run token usage reaches a budget cap
Basic Usage¶
from pydantic_ai import Agent
from pydantic_ai_summarization import create_limit_warner_processor
processor = create_limit_warner_processor(
max_iterations=40,
max_context_tokens=100000,
max_total_tokens=200000,
)
agent = Agent(
"openai:gpt-4.1",
history_processors=[processor],
)
How It Works¶
- Removes any warning parts it generated on earlier turns
- Checks configured limits against the current
RunContext - Measures current context size from the cleaned message history
- Appends a new trailing
ModelRequestwhoseUserPromptPartcarries the warning text (a separate user turn, not extra system text on the last message)
The warning is ephemeral for context-window pressure: once compaction reduces the history size, the old context warning is removed and not re-injected.
Iteration and total-token warnings are monotonic: if those metrics are still above threshold, the processor injects an updated warning again on the next turn.
Configuration¶
from pydantic_ai_summarization import LimitWarnerProcessor
processor = LimitWarnerProcessor(
max_iterations=30,
max_context_tokens=90000,
max_total_tokens=180000,
warn_on=["iterations", "context_window", "total_tokens"],
warning_threshold=0.75,
critical_remaining_iterations=2,
)
warning_threshold¶
Default 0.7. The fraction of a configured limit at which warnings begin. For each enabled
metric the processor computes usage = current / limit; no warning is emitted until
usage >= warning_threshold. For example, with max_iterations=40 and the default 0.7,
warnings start once 28 of 40 requests have been used. Must satisfy 0 < x <= 1.
critical_remaining_iterations¶
Default 3. Warnings carry one of two severities, URGENT or CRITICAL. A warning starts
as URGENT and escalates to CRITICAL when the situation is dire:
- Iterations — escalates to
CRITICALonce the remaining request count (max_iterations - requests_used) is<= critical_remaining_iterations. - Context window and total tokens — escalate to
CRITICALonly once usage reaches or exceeds the limit (usage >= 1).
When multiple warnings are active, the combined message is URGENT only if every active
warning is URGENT; otherwise it is CRITICAL. The severity also changes the guidance text:
URGENT asks the agent to "complete the current task efficiently", while CRITICAL asks it
to "complete the current task immediately". critical_remaining_iterations must be
non-negative.
Ordering with Other Processors¶
Processor order matters:
- Put
LimitWarnerProcessorafter trimming or summarization processors if you want warnings to reflect the post-compaction history. - Put it before them only if you explicitly want a pre-compaction warning pass.