Capabilities¶
The recommended way to add context management to a Pydantic AI agent is via capabilities. No middleware wrappers needed.
Why Capabilities?¶
| Feature | Capabilities | Processor API |
|---|---|---|
| Native pydantic-ai | Yes | Yes |
| Tool output truncation | ContextManagerCapability |
No |
| Auto-detect max_tokens | ContextManagerCapability |
No |
compact() outside run |
ContextManagerCapability |
No |
| Agent-triggered compaction | ContextManagerCapability (with include_compact_tool=True) |
No |
| AgentSpec YAML | Yes | No |
Available Capabilities¶
ContextManagerCapability¶
Full context management — token tracking, auto-compression, tool output truncation:
from pydantic_ai_summarization import ContextManagerCapability
cap = ContextManagerCapability(
max_tokens=100_000, # Auto-detected if None
compress_threshold=0.9, # Compress at 90% usage
max_tool_output_tokens=5000, # Truncate large tool outputs
include_compact_tool=True, # Add compact_conversation tool
)
When include_compact_tool=True, the agent gets a compact_conversation(focus?) tool
that triggers compression on the next model request. The optional focus parameter
guides the summary to prioritize specific topics.
Parameters¶
| Parameter | Default | Description |
|---|---|---|
max_tokens |
None |
Token budget. When None, it is auto-detected from the model via genai-prices on the first run, falling back to 200_000 if detection fails. |
compress_threshold |
0.9 |
Fraction of max_tokens at which auto-compression fires. Must satisfy 0 < x <= 1 (validated in __post_init__). |
keep |
("messages", 0) |
How much of the tail to preserve after compression. The default of 0 means only the generated summary survives. |
summarization_model |
"openai:gpt-4.1-mini" |
Model used to generate the summary. |
token_counter |
count_tokens_approximately |
Callable used to measure context size. Sync or async (TokenCounter). |
summary_prompt |
DEFAULT_SUMMARY_PROMPT |
Prompt template used when summarizing. |
max_tool_output_tokens |
None |
When set, tool outputs larger than this (measured as tokens * 4 chars) are truncated head/tail. None disables truncation. |
tool_output_head_lines |
5 |
Lines kept from the start of a truncated tool output. |
tool_output_tail_lines |
5 |
Lines kept from the end of a truncated tool output. |
on_usage_update |
None |
Callback (pct, current, max_tokens) invoked on every model request (and again after compression) for live token tracking. |
on_before_compress |
None |
Callback (messages, cutoff_index) invoked just before compression runs. |
on_after_compress |
None |
Callback (messages); if it returns a str, that text is re-injected into the first request as a SystemPromptPart. |
include_compact_tool |
False |
When True, registers the compact_conversation agent tool. |
Auto-compression mechanism¶
On every before_model_request, the capability counts the tokens in the current history
and computes pct = total / max_tokens. It calls on_usage_update(pct, total, max_tokens)
if provided. Compression runs when pct >= compress_threshold or when a manual
compaction has been requested (via the compact_conversation tool or
request_compact()).
Internally it drives a SummarizationProcessor
configured with trigger=("fraction", compress_threshold), keep, and
max_input_tokens=max_tokens, so the summary keeps only the tail specified by keep.
SummarizationCapability¶
LLM-based history compression:
from pydantic_ai_summarization import SummarizationCapability
cap = SummarizationCapability(
trigger=("messages", 50),
keep=("messages", 10),
)
Capability defaults differ from the standalone processor
SummarizationCapability
ships its own defaults — model="openai:gpt-4.1-mini", trigger=("messages", 50), and
keep=("messages", 10) — which are not necessarily the same as the
SummarizationProcessor
defaults. Set the parameters explicitly if you need a specific configuration.
SlidingWindowCapability¶
Zero-cost message trimming:
from pydantic_ai_summarization import SlidingWindowCapability
cap = SlidingWindowCapability(
trigger=("messages", 100),
keep=("messages", 50),
)
LimitWarnerCapability¶
Warn the agent before limits hit:
from pydantic_ai_summarization import LimitWarnerCapability
cap = LimitWarnerCapability(
max_iterations=40,
max_context_tokens=100_000,
max_total_tokens=200_000,
)