LimitWarnerProcessor¶

The LimitWarnerProcessor is a standalone history processor that injects warning instructions into the next model request as configured limits approach.

It is useful when you want the agent to finish efficiently before:

request count reaches its maximum
current message history gets too close to the context window
cumulative run token usage reaches a budget cap

Basic Usage¶

Python

from pydantic_ai import Agent
from pydantic_ai_summarization import create_limit_warner_processor

processor = create_limit_warner_processor(
    max_iterations=40,
    max_context_tokens=100000,
    max_total_tokens=200000,
)

agent = Agent(
    "openai:gpt-4.1",
    history_processors=[processor],
)

How It Works¶

Removes any warning parts it generated on earlier turns
Checks configured limits against the current RunContext
Measures current context size from the cleaned message history
Appends a new trailing ModelRequest whose UserPromptPart carries the warning text (a separate user turn, not extra system text on the last message)

The warning is ephemeral for context-window pressure: once compaction reduces the history size, the old context warning is removed and not re-injected.

Iteration and total-token warnings are monotonic: if those metrics are still above threshold, the processor injects an updated warning again on the next turn.

Configuration¶

Python

from pydantic_ai_summarization import LimitWarnerProcessor

processor = LimitWarnerProcessor(
    max_iterations=30,
    max_context_tokens=90000,
    max_total_tokens=180000,
    warn_on=["iterations", "context_window", "total_tokens"],
    warning_threshold=0.75,
    critical_remaining_iterations=2,
)

`warning_threshold`¶

Default 0.7. The fraction of a configured limit at which warnings begin. For each enabled metric the processor computes usage = current / limit; no warning is emitted until usage >= warning_threshold. For example, with max_iterations=40 and the default 0.7, warnings start once 28 of 40 requests have been used. Must satisfy 0 < x <= 1.

`critical_remaining_iterations`¶

Default 3. Warnings carry one of two severities, URGENT or CRITICAL. A warning starts as URGENT and escalates to CRITICAL when the situation is dire:

Iterations — escalates to CRITICAL once the remaining request count (max_iterations - requests_used) is <= critical_remaining_iterations.
Context window and total tokens — escalate to CRITICAL only once usage reaches or exceeds the limit (usage >= 1).

When multiple warnings are active, the combined message is URGENT only if every active warning is URGENT; otherwise it is CRITICAL. The severity also changes the guidance text: URGENT asks the agent to "complete the current task efficiently", while CRITICAL asks it to "complete the current task immediately". critical_remaining_iterations must be non-negative.

Ordering with Other Processors¶

Processor order matters:

Put LimitWarnerProcessor after trimming or summarization processors if you want warnings to reflect the post-compaction history.
Put it before them only if you explicitly want a pre-compaction warning pass.