Retrieval¶

memv uses hybrid search combining vector similarity and BM25 text search, merged with Reciprocal Rank Fusion.

How It Works¶

flowchart LR
    Q[Query] --> EMB[Embed Query]
    Q --> TXT[Query Text]
    EMB --> VS[Vector Search<br>knowledge + episodes]
    TXT --> BM[BM25 Search<br>knowledge + episodes]
    VS & BM --> RRF[RRF Fusion<br>k=60]
    RRF --> R[RetrievalResult]

Vector Search¶

The query is embedded using your EmbeddingClient and compared against stored embeddings via sqlite-vec. This captures semantic similarity — "Where does the user work?" matches "Employed at Anthropic" even though they share no keywords.

BM25 Text Search¶

The query is also run through FTS5 full-text search. This catches exact keyword matches that vector search might miss — proper nouns, technical terms, specific numbers.

Reciprocal Rank Fusion¶

Results from both searches are merged using RRF:

Text Only

RRF_score = 1/(k + rank_vector) + 1/(k + rank_text)

With k=60, this balances both signals without either dominating. Items that rank well in both searches score highest.

Tuning¶

vector_weight¶

The vector_weight parameter (0.0 to 1.0) controls the balance:

Python

# Semantic-heavy (good for vague queries)
result = await memory.retrieve(query, user_id=uid, vector_weight=0.8)

# Keyword-heavy (good for specific terms)
result = await memory.retrieve(query, user_id=uid, vector_weight=0.2)

# Balanced (default)
result = await memory.retrieve(query, user_id=uid, vector_weight=0.5)

top_k¶

Controls how many results to return per category:

Python

result = await memory.retrieve(query, user_id=uid, top_k=10)  # Default

include_episodes¶

By default, retrieval searches both knowledge and episodes. Disable episode retrieval if you only want extracted facts:

Python

result = await memory.retrieve(query, user_id=uid, include_episodes=False)

Output Formatting¶

RetrievalResult provides two formatting options:

to_prompt()¶

Groups knowledge by source episode — designed for LLM context injection:

Markdown

## Relevant Context

### User's Work Background
_The user discussed their job at Anthropic during an initial introduction._

Key facts:
- The user works at Anthropic as a researcher
- Their focus area is AI safety, specifically interpretability

### Additional Facts
- The user prefers Python for data analysis

as_text()¶

Simple newline-separated statements:

Text Only

The user works at Anthropic as a researcher
Their focus area is AI safety, specifically interpretability
The user prefers Python for data analysis

Embedding Cache¶

An optional LRU cache with TTL reduces API calls for repeated or similar queries:

Python

memory = Memory(
    # ...
    enable_embedding_cache=True,         # Default
    embedding_cache_size=1000,           # Max cached embeddings
    embedding_cache_ttl_seconds=600,     # 10 minute TTL
)