Retrieval — How Questions Get Answered¶
When you ask GraphRAG a question, it doesn't just search for keywords. It uses multiple retrieval paths in parallel — like asking five different experts to find relevant information, then combining their best findings into one answer.
This document explains how each path works, what it contributes, and how they perform individually and together.
The Big Picture¶
Your Question
|
┌──────────┴──────────┐
v v
1. Extract Keywords 2. Embed Question
| |
┌─────────┴──────────────────────┴─────────┐
v v v
3a. RELATES Vector 3b. Text-to-Cypher 4. Entity Discovery
(fact search) (graph queries) (name matching)
| | |
└───────────────────┴───────────────────────┘
|
5. Relationship Expansion
|
6. Chunk Retrieval (4 paths)
┌────┬────┬────┬────┐
v v v v |
Full- Vector MENT- 2-hop
text search IONED graph
└────┴────┴────┴────┘
|
7. Document Mapping
|
┌──────────────┴──────────────┐
v v
8a. Rerank Passages 8b. Filter Facts
(stored embeddings) (score threshold)
| |
└──────────────┬──────────────┘
v
9. Assemble Context ← Cypher results
| (direct, no reranking)
v
Final LLM Answer
Step-by-Step Explanation¶
Step 1 — Keyword Extraction¶
What it does: Pulls out the important words from your question.
How: Two methods run together: - Simple filter: Removes common words ("the", "is", "what") and keeps meaningful terms - LLM extraction: Asks the language model to identify proper nouns, names, places, and specific terms
Example:
Question: "What did Professor Harmon discover at the lighthouse?"
Simple keywords:
["professor", "harmon", "discover", "lighthouse"]LLM keywords:["Professor Harmon", "lighthouse"]
Code: MultiPathRetrieval._extract_keywords() in multi_path.py
Step 2 — Embed the Question¶
What it does: Converts the question text into a numerical vector (a list of numbers) that captures its meaning.
Why: This vector is used later to find chunks and facts that are semantically similar to the question — even if they don't share the exact same words.
Code: Embedder.aembed_query() in providers/base.py
Step 3a — RELATES Vector Search (Knowledge Graph Facts)¶
What it does: Searches the relationship edges in the graph by meaning similarity. Every relationship between entities (like "Alice WORKS_AT Acme Corp") has been embedded as a vector during ingestion. This step finds the relationships most relevant to the question.
Returns: Fact strings like:
Important: Facts are scored by their vector similarity to the question. Low-scoring facts are filtered out (see Step 8b) to reduce noise.
Code: search_relates_edges() in entity_discovery.py
Step 3b — Text-to-Cypher (Graph Queries)¶
What it does: Asks the LLM to write a database query (in the Cypher language) that can directly answer the question from the graph structure.
Why it matters: Some questions need structural information that text search can't provide:
- "How many organizations are in the story?" → needs COUNT
- "What connects Alice and the castle?" → needs graph path traversal
- "List all locations mentioned" → needs MATCH (l:Location) RETURN l.name
How it works: 1. The LLM receives a description of the graph schema (what node types exist, how edges work) 2. It generates a Cypher query tailored to the question 3. The query is validated (read-only, valid labels) and sanitized (adds LIMIT, removes unsupported FalkorDB patterns) 4. If the query executes successfully, the results go directly to the final LLM context — they are NOT filtered by the reranker
Runs in parallel with step 3a to avoid adding latency. If it fails, the other paths still produce results.
Code: execute_cypher_retrieval() in cypher_generation.py
Step 4 — Entity Discovery¶
What it does: Finds entities (people, places, organizations, etc.) in the graph that match the question's keywords.
Two paths:
- Path A — Name matching: Searches entity names using CONTAINS (e.g., "lighthouse" matches "The Old Lighthouse"). Runs as a single batched database query for efficiency.
- Path B — Fulltext search: Uses the text search index on entity names and descriptions. Good for partial matches and stemming.
Entities found in steps 3a and 3b are also merged in here.
Code: discover_entities() in entity_discovery.py
Step 5 — Relationship Expansion¶
What it does: Starting from the discovered entities, traverses the graph to find their relationships.
Two depths: - 1-hop: Direct relationships (Alice → WORKS_AT → Acme Corp) - 2-hop: Indirect connections through an intermediate entity (Alice → WORKS_AT → Acme → LOCATED_IN → New York)
Returns: Formatted relationship strings with evidence, like:
Alice —[WORKS_AT]→ Acme Corp: Alice joined Acme as a senior engineer in 2019
Acme Corp —[LOCATED_IN]→ New York: Acme Corp headquarters is in Manhattan
Code: expand_relationships() in relationship_expansion.py
Step 6 — Chunk Retrieval (4 Paths)¶
What it does: Finds the actual text passages (chunks) from the original documents that are most relevant to the question. This is the core of passage-based retrieval.
Four independent paths ensure we don't miss relevant passages:
| Path | Method | What it finds |
|---|---|---|
| A — Fulltext | Keyword search on chunk text | Passages containing exact keyword matches |
| B — Vector | Embedding similarity search | Passages with similar meaning (even without shared keywords) |
| C — MENTIONED_IN | Graph traversal from entities to their source chunks | Passages where discovered entities were originally extracted |
| D — 2-hop | Entity → related entity → chunk | Passages containing entities that are connected to the question's entities |
All four paths contribute to a single pool of candidate chunks.
Code: retrieve_chunks() in chunk_retrieval.py
Step 7 — Document Mapping¶
What it does: Looks up which source document each chunk came from, so the final answer can reference the source.
Code: fetch_chunk_documents() in chunk_retrieval.py
Step 8 — Reranking (Differentiated)¶
Facts and passages are ranked by different criteria because they have different characteristics:
8a — Passage Reranking (Stored Embeddings)¶
What it does: Ranks the candidate chunks by how similar their meaning is to the question, keeping only the top 15.
How: Each chunk already has an embedding vector stored in the graph from ingestion. Instead of re-computing embeddings (which would require an expensive API call), we fetch the stored vectors and compute cosine similarity locally. This makes reranking instant instead of taking 2-3 seconds.
Code: rerank_chunks() in result_assembly.py
8b — Fact Filtering (Score Threshold)¶
What it does: Filters knowledge graph facts by their vector similarity score from step 3a.
Why separate? Facts are short structured strings ("Alice —[WORKS_AT]→ Acme") while passages are long prose paragraphs. Short text has higher cosine similarity variance — a threshold that works for passages would let too many irrelevant facts through. Facts use a higher threshold (0.25) and always keep at least 3 top facts.
Code: filter_facts_by_relevance() in result_assembly.py
Step 9 — Context Assembly¶
What it does: Combines everything into a structured context document that the LLM uses to generate the final answer.
Sections (in order): 1. Answer format hint — e.g., "This is a yes/no question" for yes/no questions 2. Graph Query Results — Direct results from text-to-cypher (bypasses reranking) 3. Key Entities — Names and descriptions of relevant entities 4. Entity Relationships — How entities connect to each other 5. Knowledge Graph Facts — Evidence from relationship embeddings 6. Source Document Passages — Ranked text passages with source attribution
The final LLM receives this structured context and generates a natural language answer.
Code: assemble_raw_result() in result_assembly.py
Benchmark Results¶
All experiments were run on the same pre-built graph (graphrag_sdk_v2_retrieval_benchmark) with 100 questions scored by an LLM judge (0-10 scale).
Enhancement Comparison¶
| Configuration | Accuracy | vs Baseline | P50 Latency | What changed |
|---|---|---|---|---|
| Baseline (MultiPath only) | 83.8% | -- | 4.2s | Current default |
| + Text-to-Cypher | 84.2% | +0.4% | 5.7s | Added graph query path |
| + All Enhancements | 84.8% | +1.0% | 5.7s | Stored rerank + fact filtering + cypher |
Accuracy by Question Type¶
| Question Type | Baseline | All Enhancements | Change |
|---|---|---|---|
| Complex Reasoning | 83% | 84% | +1% |
| Contextual Summarize | 84% | 87% | +3% |
| Creative Generation | 80% | 80% | -- |
| Fact Retrieval | 85% | 85% | -- |
Text-to-Cypher helps most with Contextual Summarize questions where structured graph relationships provide the context the LLM needs to produce comprehensive summaries.
Isolated Path Performance¶
Each retrieval path was tested in isolation (only that path, no others) to measure its individual contribution:
| Rank | Retrieval Path | Accuracy Alone | Role in Pipeline |
|---|---|---|---|
| 1 | Vector Search (chunk embeddings) | 77.4% | The backbone — strongest single path |
| 2 | Fulltext Search (keyword matching) | 74.8% | Catches exact keyword matches that vectors miss |
| 3 | RELATES Vector (relationship facts) | 70.0% | Provides structured knowledge (who/what/where) |
| 4 | Graph Traversal (MENTIONED_IN edges) | 69.1% | Links entities back to their source text |
| 5 | Text-to-Cypher (LLM-generated queries) | 53.0% | Weakest alone, but adds unique structural signal |
Key insight: No single path matches the combined pipeline (83.8%). The value is in multi-path fusion — each path finds information the others miss, and together they cover more ground than any individual approach.
Latency Profile¶
| Retrieval Path | P50 Latency | Notes |
|---|---|---|
| RELATES vector search | 1.6s | Fastest — single vector index query |
| Vector chunk search | 2.2s | Fast — single vector index query |
| Fulltext chunk search | 2.7s | Multiple keyword queries |
| Graph traversal | 2.8s | Depends on entity fanout |
| Text-to-Cypher | 4.5s | Includes LLM call for query generation |
| Full MultiPath pipeline | 4.2s | Steps run in parallel where possible |
Configuration¶
Toggling Text-to-Cypher¶
from graphrag_sdk.retrieval.strategies.multi_path import MultiPathRetrieval
# Without text-to-cypher (default)
strategy = MultiPathRetrieval(
graph_store=rag.graph_store,
vector_store=rag.vector_store,
embedder=embedder,
llm=llm,
enable_cypher=False, # default
)
# With text-to-cypher (opt-in, adds ~1.5s latency)
strategy = MultiPathRetrieval(
graph_store=rag.graph_store,
vector_store=rag.vector_store,
embedder=embedder,
llm=llm,
enable_cypher=True,
)
result = await rag.completion("Your question", strategy=strategy)
Tuning Parameters¶
| Parameter | Default | What it controls |
|---|---|---|
chunk_top_k |
15 | Maximum passages after reranking |
max_entities |
30 | Maximum entities to keep from discovery |
max_relationships |
20 | Maximum relationships to include |
rel_top_k |
15 | RELATES edge vector search results |
enable_cypher |
False | Toggle text-to-cypher path (opt-in) |
Using the External Reranker¶
The pipeline has built-in reranking (step 8), but you can also apply an external reranker after retrieval:
from graphrag_sdk.retrieval.reranking_strategies.cosine import CosineReranker
reranker = CosineReranker(embedder=embedder, top_k=10)
result = await rag.completion("Your question", reranker=reranker)
File Reference¶
| File | What it contains |
|---|---|
multi_path.py |
Main orchestrator — coordinates all 9 steps |
entity_discovery.py |
RELATES vector search + 2-path entity discovery |
chunk_retrieval.py |
4-path chunk retrieval + document mapping |
relationship_expansion.py |
1-hop and 2-hop relationship traversal |
result_assembly.py |
Reranking, fact filtering, question hints, context assembly |
cypher_generation.py |
Text-to-Cypher: schema prompt, generation, validation, execution |
base.py |
RetrievalStrategy abstract base class |
local.py |
Simple vector + 1-hop retrieval (alternative strategy) |
cosine.py |
External cosine reranker |
router.py |
Semantic router for conditional strategy selection |