Core Concepts¶
summarization-pydantic-ai provides automatic conversation context management for pydantic-ai agents.
Overview¶
When agent conversations grow long, they can exceed the model's context window. This library provides two approaches:
Capabilities (Recommended)¶
Native pydantic-ai capabilities — plug and play:
| Capability | Description | Cost |
|---|---|---|
| ContextManagerCapability | Full context management + tool truncation | Per compression |
| SummarizationCapability | LLM-based history compression | High |
| SlidingWindowCapability | Zero-cost message trimming | Zero |
| LimitWarnerCapability | Warning injection before limits | Zero |
Standalone Processors¶
Lower-level API for use with history_processors=:
| Processor | Description | Cost |
|---|---|---|
| SummarizationProcessor | Uses LLM to summarize old messages | High |
| SlidingWindowProcessor | Simply discards old messages | Zero |
| LimitWarnerProcessor | Injects finish-soon warnings | Zero |
The compaction processors:
- Monitor conversation length (messages or tokens)
- Trigger processing when thresholds are reached
- Find safe cutoff that preserves tool call pairs
- Process older messages (summarize or discard)
- Preserve recent messages for context continuity
The warning processor complements those strategies by notifying the agent before limits are hit, without changing the message history structure.
Key Components¶
SummarizationProcessor¶
Intelligent summarization using an LLM:
Python
from pydantic_ai import Agent
from pydantic_ai_summarization import SummarizationProcessor
processor = SummarizationProcessor(
model="openai:gpt-4.1",
trigger=("tokens", 100000),
keep=("messages", 20),
)
agent = Agent(
"openai:gpt-4.1",
history_processors=[processor],
)
SlidingWindowProcessor¶
Zero-cost trimming without LLM:
Python
from pydantic_ai import Agent
from pydantic_ai_summarization import SlidingWindowProcessor
processor = SlidingWindowProcessor(
trigger=("messages", 100),
keep=("messages", 50),
)
agent = Agent(
"openai:gpt-4.1",
history_processors=[processor],
)
LimitWarnerProcessor¶
Warning-only processor with no LLM cost:
Python
from pydantic_ai import Agent
from pydantic_ai_summarization import LimitWarnerProcessor
processor = LimitWarnerProcessor(
max_iterations=40,
max_context_tokens=100000,
max_total_tokens=200000,
)
agent = Agent(
"openai:gpt-4.1",
history_processors=[processor],
)
Factory Functions¶
Convenience functions with sensible defaults:
Python
from pydantic_ai_summarization import (
create_summarization_processor,
create_sliding_window_processor,
create_limit_warner_processor,
)
# Summarization with defaults
summarizer = create_summarization_processor(
trigger=("messages", 50),
keep=("messages", 10),
)
# Sliding window with defaults
window = create_sliding_window_processor(
trigger=("messages", 100),
keep=("messages", 50),
)
# Limit warnings with defaults for the configured caps
warner = create_limit_warner_processor(
max_iterations=40,
max_context_tokens=100000,
)
How SummarizationProcessor Works¶
Text Only
┌─────────────────────────────────────────────────────────────┐
│ Agent Conversation │
├─────────────────────────────────────────────────────────────┤
│ Message 1 │ Message 2 │ ... │ Message N-1 │ Msg N │
└──────────────┴──────────────┴───────┴───────────────┴───────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ SummarizationProcessor │
├─────────────────────────────────────────────────────────────┤
│ 1. Count tokens │
│ 2. Check triggers (messages/tokens/fraction) │
│ 3. Find safe cutoff point │
│ 4. Generate summary via LLM │
│ 5. Replace old messages with summary │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Processed Messages │
├─────────────────────────────────────────────────────────────┤
│ Summary Message │ Message N-19 │ ... │ Message N │
└───────────────────┴────────────────┴───────┴────────────────┘
How SlidingWindowProcessor Works¶
Text Only
┌─────────────────────────────────────────────────────────────┐
│ Agent Conversation │
├─────────────────────────────────────────────────────────────┤
│ Message 1 │ Message 2 │ ... │ Message N-1 │ Msg N │
└──────────────┴──────────────┴───────┴───────────────┴───────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ SlidingWindowProcessor │
├─────────────────────────────────────────────────────────────┤
│ 1. Count messages/tokens │
│ 2. Check triggers │
│ 3. Find safe cutoff point │
│ 4. Discard old messages (no LLM call) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Processed Messages │
├─────────────────────────────────────────────────────────────┤
│ Message N-49 │ Message N-48 │ ... │ Message N │
└────────────────┴────────────────┴───────┴───────────────────┘
Tool Call Safety¶
Both processors ensure tool call/response pairs are never split:
Text Only
❌ Bad cutoff (splits pair):
[Tool Call: search] | [Tool Result: found 5 items] [User: thanks]
↑ cutoff here
✅ Good cutoff (preserves pair):
[User: find items] | [Tool Call: search] [Tool Result: found 5 items]
↑ cutoff here
Choosing a Processor¶
| Requirement | Recommended |
|---|---|
| Context quality is critical | SummarizationProcessor |
| Speed and cost are priorities | SlidingWindowProcessor |
| You want the agent to finish before a hard limit | LimitWarnerProcessor |
| Running many parallel conversations | SlidingWindowProcessor |
| Long-running coding sessions | SummarizationProcessor |
| Simple chatbots | SlidingWindowProcessor |
| Deterministic behavior needed | SlidingWindowProcessor |
Next Steps¶
- Learn about Triggers
- See the SummarizationProcessor details
- See the SlidingWindowProcessor details
- See the LimitWarnerProcessor details
- View Examples