Skip to content

Core Concepts

summarization-pydantic-ai provides automatic conversation context management for pydantic-ai agents.

Overview

When agent conversations grow long, they can exceed the model's context window. This library provides two approaches:

Native pydantic-ai capabilities — plug and play:

Capability Description Cost
ContextManagerCapability Full context management + tool truncation Per compression
SummarizationCapability LLM-based history compression High
SlidingWindowCapability Zero-cost message trimming Zero
LimitWarnerCapability Warning injection before limits Zero

Standalone Processors

Lower-level API for use with history_processors=:

Processor Description Cost
SummarizationProcessor Uses LLM to summarize old messages High
SlidingWindowProcessor Simply discards old messages Zero
LimitWarnerProcessor Injects finish-soon warnings Zero

The compaction processors:

  1. Monitor conversation length (messages or tokens)
  2. Trigger processing when thresholds are reached
  3. Find safe cutoff that preserves tool call pairs
  4. Process older messages (summarize or discard)
  5. Preserve recent messages for context continuity

The warning processor complements those strategies by notifying the agent before limits are hit, without changing the message history structure.

Key Components

SummarizationProcessor

Intelligent summarization using an LLM:

Python
from pydantic_ai import Agent
from pydantic_ai_summarization import SummarizationProcessor

processor = SummarizationProcessor(
    model="openai:gpt-4.1",
    trigger=("tokens", 100000),
    keep=("messages", 20),
)

agent = Agent(
    "openai:gpt-4.1",
    history_processors=[processor],
)

SlidingWindowProcessor

Zero-cost trimming without LLM:

Python
from pydantic_ai import Agent
from pydantic_ai_summarization import SlidingWindowProcessor

processor = SlidingWindowProcessor(
    trigger=("messages", 100),
    keep=("messages", 50),
)

agent = Agent(
    "openai:gpt-4.1",
    history_processors=[processor],
)

LimitWarnerProcessor

Warning-only processor with no LLM cost:

Python
from pydantic_ai import Agent
from pydantic_ai_summarization import LimitWarnerProcessor

processor = LimitWarnerProcessor(
    max_iterations=40,
    max_context_tokens=100000,
    max_total_tokens=200000,
)

agent = Agent(
    "openai:gpt-4.1",
    history_processors=[processor],
)

Factory Functions

Convenience functions with sensible defaults:

Python
from pydantic_ai_summarization import (
    create_summarization_processor,
    create_sliding_window_processor,
    create_limit_warner_processor,
)

# Summarization with defaults
summarizer = create_summarization_processor(
    trigger=("messages", 50),
    keep=("messages", 10),
)

# Sliding window with defaults
window = create_sliding_window_processor(
    trigger=("messages", 100),
    keep=("messages", 50),
)

# Limit warnings with defaults for the configured caps
warner = create_limit_warner_processor(
    max_iterations=40,
    max_context_tokens=100000,
)

How SummarizationProcessor Works

Text Only
┌─────────────────────────────────────────────────────────────┐
│                    Agent Conversation                        │
├─────────────────────────────────────────────────────────────┤
│  Message 1   │  Message 2   │  ...  │  Message N-1  │ Msg N │
└──────────────┴──────────────┴───────┴───────────────┴───────┘
┌─────────────────────────────────────────────────────────────┐
│                 SummarizationProcessor                       │
├─────────────────────────────────────────────────────────────┤
│  1. Count tokens                                            │
│  2. Check triggers (messages/tokens/fraction)               │
│  3. Find safe cutoff point                                  │
│  4. Generate summary via LLM                                │
│  5. Replace old messages with summary                       │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│                    Processed Messages                        │
├─────────────────────────────────────────────────────────────┤
│  Summary Message  │  Message N-19  │  ...  │  Message N     │
└───────────────────┴────────────────┴───────┴────────────────┘

How SlidingWindowProcessor Works

Text Only
┌─────────────────────────────────────────────────────────────┐
│                    Agent Conversation                        │
├─────────────────────────────────────────────────────────────┤
│  Message 1   │  Message 2   │  ...  │  Message N-1  │ Msg N │
└──────────────┴──────────────┴───────┴───────────────┴───────┘
┌─────────────────────────────────────────────────────────────┐
│                 SlidingWindowProcessor                       │
├─────────────────────────────────────────────────────────────┤
│  1. Count messages/tokens                                   │
│  2. Check triggers                                          │
│  3. Find safe cutoff point                                  │
│  4. Discard old messages (no LLM call)                      │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│                    Processed Messages                        │
├─────────────────────────────────────────────────────────────┤
│  Message N-49  │  Message N-48  │  ...  │  Message N        │
└────────────────┴────────────────┴───────┴───────────────────┘

Tool Call Safety

Both processors ensure tool call/response pairs are never split:

Text Only
❌ Bad cutoff (splits pair):
[Tool Call: search] | [Tool Result: found 5 items] [User: thanks]
                    ↑ cutoff here

✅ Good cutoff (preserves pair):
[User: find items] | [Tool Call: search] [Tool Result: found 5 items]
                   ↑ cutoff here

Choosing a Processor

Requirement Recommended
Context quality is critical SummarizationProcessor
Speed and cost are priorities SlidingWindowProcessor
You want the agent to finish before a hard limit LimitWarnerProcessor
Running many parallel conversations SlidingWindowProcessor
Long-running coding sessions SummarizationProcessor
Simple chatbots SlidingWindowProcessor
Deterministic behavior needed SlidingWindowProcessor

Next Steps