Core Concepts¶

summarization-pydantic-ai provides automatic conversation context management for pydantic-ai agents.

Overview¶

When agent conversations grow long, they can exceed the model's context window. This library provides two approaches:

Capabilities (Recommended)¶

Native pydantic-ai capabilities — plug and play:

Capability	Description	Cost
`ContextManagerCapability`	Full context management + tool truncation	Per compression
`SummarizationCapability`	LLM-based history compression	High
`SlidingWindowCapability`	Zero-cost message trimming	Zero
`LimitWarnerCapability`	Warning injection before limits	Zero

Standalone Processors¶

Lower-level API for use with history_processors=:

Processor	Description	Cost
`SummarizationProcessor`	Uses LLM to summarize old messages	High
`SlidingWindowProcessor`	Simply discards old messages	Zero
`LimitWarnerProcessor`	Injects finish-soon warnings	Zero

The compaction processors:

Monitor conversation length (messages or tokens)
Trigger processing when thresholds are reached
Find safe cutoff that preserves tool call pairs
Process older messages (summarize or discard)
Preserve recent messages for context continuity

The warning processor complements those strategies by notifying the agent before limits are hit, without changing the message history structure.

Key Components¶

SummarizationProcessor¶

Intelligent summarization using an LLM:

Python

from pydantic_ai import Agent
from pydantic_ai_summarization import SummarizationProcessor

processor = SummarizationProcessor(
    model="openai:gpt-4.1",
    trigger=("tokens", 100000),
    keep=("messages", 20),
)

agent = Agent(
    "openai:gpt-4.1",
    history_processors=[processor],
)

SlidingWindowProcessor¶

Zero-cost trimming without LLM:

Python

from pydantic_ai import Agent
from pydantic_ai_summarization import SlidingWindowProcessor

processor = SlidingWindowProcessor(
    trigger=("messages", 100),
    keep=("messages", 50),
)

agent = Agent(
    "openai:gpt-4.1",
    history_processors=[processor],
)

LimitWarnerProcessor¶

Warning-only processor with no LLM cost:

Python

from pydantic_ai import Agent
from pydantic_ai_summarization import LimitWarnerProcessor

processor = LimitWarnerProcessor(
    max_iterations=40,
    max_context_tokens=100000,
    max_total_tokens=200000,
)

agent = Agent(
    "openai:gpt-4.1",
    history_processors=[processor],
)

Factory Functions¶

Convenience functions with sensible defaults:

Python

from pydantic_ai_summarization import (
    create_summarization_processor,
    create_sliding_window_processor,
    create_limit_warner_processor,
)

# Summarization with defaults
summarizer = create_summarization_processor(
    trigger=("messages", 50),
    keep=("messages", 10),
)

# Sliding window with defaults
window = create_sliding_window_processor(
    trigger=("messages", 100),
    keep=("messages", 50),
)

# Limit warnings with defaults for the configured caps
warner = create_limit_warner_processor(
    max_iterations=40,
    max_context_tokens=100000,
)

How SummarizationProcessor Works¶

Text Only

┌─────────────────────────────────────────────────────────────┐
│                    Agent Conversation                        │
├─────────────────────────────────────────────────────────────┤
│  Message 1   │  Message 2   │  ...  │  Message N-1  │ Msg N │
└──────────────┴──────────────┴───────┴───────────────┴───────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                 SummarizationProcessor                       │
├─────────────────────────────────────────────────────────────┤
│  1. Count tokens                                            │
│  2. Check triggers (messages/tokens/fraction)               │
│  3. Find safe cutoff point                                  │
│  4. Generate summary via LLM                                │
│  5. Replace old messages with summary                       │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Processed Messages                        │
├─────────────────────────────────────────────────────────────┤
│  Summary Message  │  Message N-19  │  ...  │  Message N     │
└───────────────────┴────────────────┴───────┴────────────────┘

How SlidingWindowProcessor Works¶

Text Only

┌─────────────────────────────────────────────────────────────┐
│                    Agent Conversation                        │
├─────────────────────────────────────────────────────────────┤
│  Message 1   │  Message 2   │  ...  │  Message N-1  │ Msg N │
└──────────────┴──────────────┴───────┴───────────────┴───────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                 SlidingWindowProcessor                       │
├─────────────────────────────────────────────────────────────┤
│  1. Count messages/tokens                                   │
│  2. Check triggers                                          │
│  3. Find safe cutoff point                                  │
│  4. Discard old messages (no LLM call)                      │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Processed Messages                        │
├─────────────────────────────────────────────────────────────┤
│  Message N-49  │  Message N-48  │  ...  │  Message N        │
└────────────────┴────────────────┴───────┴───────────────────┘

Tool Call Safety¶

Both processors ensure tool call/response pairs are never split:

Text Only

❌ Bad cutoff (splits pair):
[Tool Call: search] | [Tool Result: found 5 items] [User: thanks]
                    ↑ cutoff here

✅ Good cutoff (preserves pair):
[User: find items] | [Tool Call: search] [Tool Result: found 5 items]
                   ↑ cutoff here

Choosing a Processor¶

Requirement	Recommended
Context quality is critical	SummarizationProcessor
Speed and cost are priorities	SlidingWindowProcessor
You want the agent to finish before a hard limit	LimitWarnerProcessor
Running many parallel conversations	SlidingWindowProcessor
Long-running coding sessions	SummarizationProcessor
Simple chatbots	SlidingWindowProcessor
Deterministic behavior needed	SlidingWindowProcessor

Next Steps¶

Learn about Triggers
See the SummarizationProcessor details
See the SlidingWindowProcessor details
See the LimitWarnerProcessor details
View Examples