Predict-Calibrate Extraction¶
The core innovation in memv. Based on Nemori.
The Problem¶
Traditional memory extraction asks: "What facts are in this conversation?"
This extracts everything — including information you already know. Over time, your knowledge base fills with redundant entries, and retrieval degrades.
The Approach¶
Predict-calibrate flips the question: "What did I fail to predict?"
flowchart TD
E[New Episode] --> R[Retrieve existing knowledge<br>relevant to episode]
R --> P[Predict what episode<br>should contain]
P --> C[Compare prediction<br>to actual content]
C --> X[Extract only what<br>was unpredicted]
X --> N[New facts]
X --> U[Updates to existing facts]
X --> D[Contradictions]
Step by Step¶
- Retrieve — Find existing knowledge relevant to the new episode
- Predict — Given what we already know, predict what this conversation contains
- Compare — Diff the prediction against the actual episode content
- Extract — Only store what the prediction missed
What Gets Extracted¶
Each extracted piece of knowledge is classified:
| Type | Meaning | Example |
|---|---|---|
new |
Previously unknown information | "User works at Anthropic" |
update |
Revision of existing knowledge | "User now works at OpenAI" (was Anthropic) |
contradiction |
Conflicts with prior knowledge | "User prefers tea" (we thought coffee) |
Why It Works¶
Importance emerges naturally from prediction error — without explicit LLM scoring.
- First conversation: Everything is unpredicted, so everything gets extracted
- Subsequent conversations: Only genuinely novel information survives the prediction filter
- Repeated topics: Already-known facts are predicted correctly and skipped
This keeps the knowledge base lean and focused on what actually matters.
Configuration¶
The max_statements_for_prediction parameter controls how many existing knowledge statements are used during prediction. More statements = better predictions but higher token cost.
See Configuration for all options.