Core Concepts¶

memv processes conversations through a pipeline that turns raw messages into retrievable knowledge.

flowchart TD
    A[Messages] -->|add_exchange| B[Message Buffer]
    B -->|process| C[Episode Segmentation]
    C --> D[Episode Generation]
    D --> E[Predict-Calibrate Extraction]
    E --> F[SemanticKnowledge]
    F --> G[VectorIndex<br>sqlite-vec]
    F --> H[TextIndex<br>FTS5]
    G & H -->|retrieve| I[RRF Fusion]
    I --> J[RetrievalResult]

How It Works¶

Messages are stored immediately via add_exchange(). They accumulate until process() is called (or auto-processing triggers).
Episodes — Messages are segmented into coherent conversation groups based on topic shifts, intent changes, and time gaps. Each episode gets a narrative summary.
Predict-Calibrate — For each episode, memv predicts what it should contain based on existing knowledge, then extracts only what was unpredicted. This is the core innovation.
Bi-Temporal Validity — Extracted knowledge tracks both when facts were true in the world and when memv learned them. Contradictions invalidate old facts rather than deleting them.
Retrieval — Queries run through both vector similarity and BM25 text search, merged with Reciprocal Rank Fusion.

Module Structure¶

Text Only

src/memv/
├── memory/           # Memory class (main API)
├── processing/       # BoundaryDetector, EpisodeGenerator, PredictCalibrateExtractor
├── retrieval/        # Hybrid search with RRF
├── storage/sqlite/   # MessageStore, EpisodeStore, KnowledgeStore, VectorIndex, TextIndex
├── models.py         # Message, Episode, SemanticKnowledge, RetrievalResult
├── config.py         # MemoryConfig
└── protocols.py      # EmbeddingClient, LLMClient