Hybrid Retrieval (v7.4)
CKB v7.4 introduces hybrid retrieval that combines graph-based ranking with traditional text search to dramatically improve search quality.
Overview
Traditional code search relies on text matching (FTS), which finds symbols by name but doesn't understand code relationships. Hybrid retrieval adds Personalized PageRank (PPR) over the symbol graph to boost results that are structurally related to your query.
Results
| Metric | Before | After | Improvement |
|---|---|---|---|
| Recall@10 | 62.1% | 100% | +61% |
| MRR | 0.546 | 0.914 | +67% |
| Latency | 29.4ms | 29.0ms | ~0% |
Note: Latency remains similar because PPR computation is cheap. The improvement is in search quality, not speed.
How It Works
1. Initial Search (FTS)
When you search for a symbol, CKB first uses SQLite FTS5 for fast text matching:
Query: "Engine"
FTS Results: Engine, Engine#logger, Engine#config, EngineMock, ...
2. Graph-Based Re-ranking (PPR)
CKB then builds a symbol graph from SCIP edges and runs Personalized PageRank:
Seeds: Top FTS hits (Engine, Engine#logger, ...)
Graph: Call edges, reference edges, type edges
PPR: Propagate importance through graph
Output: Re-ranked by graph proximity + FTS score
Seed Expansion
When FTS returns struct fields (e.g., Engine#logger), seed expansion automatically includes related methods:
FTS seeds: Engine#logger, Engine#config, Engine#db
Expanded: + Engine#SearchSymbols(), Engine#GetCallGraph(), ...
This helps PPR discover cross-module dependencies through method calls, not just field references.
3. Combined Scoring
The final ranking combines FTS position with PPR scores:
combined = 0.6 * position_score + 0.4 * ppr_score
Where:
- position_score = 1 / (rank + 1) — Original FTS ranking bonus
- ppr_score — Normalized PPR importance from graph traversal
This simple approach achieves 100% recall without the complexity of multi-signal fusion.
Eval Suite
CKB includes an evaluation framework to measure retrieval quality.
Running Eval
# Run built-in tests
ckb eval
# Custom fixtures
ckb eval --fixtures=./my-tests.json
# JSON output
ckb eval --format=json
Test Types
Needle tests - Find at least one expected symbol in top-K:
{
"id": "find-engine",
"type": "needle",
"query": "Engine",
"expectedSymbols": ["Engine", "query.Engine"],
"topK": 10
}
Ranking tests - Verify expected symbol is highly ranked:
{
"id": "engine-first",
"type": "ranking",
"query": "query engine",
"expectedSymbols": ["Engine"],
"topK": 3
}
Expansion tests - Check graph connectivity:
{
"id": "engine-connects-backends",
"type": "expansion",
"query": "Engine",
"expectedSymbols": ["Engine", "Orchestrator", "SCIPAdapter"],
"topK": 20
}
Metrics
- Recall@K - % of tests where expected symbol was in top-K
- MRR - Mean Reciprocal Rank (higher = expected found earlier)
- Latency - Average query time
PPR Algorithm
Personalized PageRank computes importance scores relative to seed nodes.
Algorithm
Input:
- seeds: FTS hit symbol IDs
- graph: SCIP call/reference edges
- damping: 0.85 (probability of following edge)
- iterations: 20 (max power iterations)
Process:
1. Initialize scores: seeds get 1/n, others get 0
2. Iterate: score[i] = damping * Σ(edge_weight * score[neighbor])
+ (1-damping) * teleport[i]
3. Stop when converged or max iterations
Output:
- Ranked nodes with scores
- Backtracked paths explaining "why"
Edge Weights
| Edge Type | Weight | Meaning |
|---|---|---|
| Call | 1.0 | Function calls function |
| Definition | 0.9 | Reference to definition |
| Reference | 0.8 | General reference |
| Implements | 0.7 | Type implements interface |
| Type-of | 0.6 | Instance of type |
| Same-module | 0.3 | Co-located symbols |
Export Organizer
The exportForLLM tool now includes an organizer step that structures output for better LLM comprehension.
Before (v7.3)
## internal/query/
! engine.go
$ Engine
# SearchSymbols()
# GetSymbol()
! symbols.go
# rankSearchResults()
After (v7.4)
## Module Map
| Module | Symbols | Files | Key Exports |
|--------|---------|-------|-------------|
| internal/query | 150 | 12 | Engine, SearchSymbols |
| internal/backends | 80 | 8 | Orchestrator, SCIPAdapter |
## Cross-Module Connections
- internal/query → internal/backends
- internal/mcp → internal/query
## Module Details
### internal/query/
**engine.go**
$ Engine
# SearchSymbols() [c=12] ★★
# GetSymbol() [c=5] ★
Benefits
- Module Map - Overview of codebase structure at a glance
- Cross-Module Bridges - Key integration points highlighted
- Importance Ordering - Most important symbols first
- Context Efficiency - LLMs understand structure before details
Configuration
No configuration required. Hybrid retrieval is automatic when:
- SCIP index is available (
ckb indexwas run) - Search returns more than 3 results
- Symbol graph has nodes
Disabling PPR
If you need to disable PPR re-ranking (not recommended):
// .ckb/config.json
{
"queryPolicy": {
"enablePPR": false
}
}
Research Basis
Hybrid retrieval is based on 2024-2025 research:
| Paper | Key Insight |
|---|---|
| HippoRAG 2 (ICML 2025) | PPR over knowledge graphs improves associative retrieval |
| CodeRAG (Sep 2025) | Multi-path retrieval + reranking beats single-path |
| GraphCoder (Jun 2024) | Code context graphs for repo-level retrieval |
| GraphRAG surveys | Explicit organizer step improves context packing |
What's NOT Included
Per CKB's "structured over semantic" principle:
| Feature | Why Skipped |
|---|---|
| Embeddings | Adds complexity, PPR sufficient for code navigation |
| Learned reranker | Deterministic scoring works well |
| External vector DB | Violates single-binary principle |
Troubleshooting
Low Recall@K
- Index freshness - Run
ckb indexto rebuild - FTS population - Check
ckb statusfor FTS symbol count - Query specificity - More specific queries work better
Slow Queries
- Graph size - Very large codebases may need graph pruning
- PPR iterations - Default 20 is usually sufficient
- Cache - Subsequent queries benefit from caching
Debugging
# Check index status
ckb status
# Run diagnostics
ckb doctor
# Verbose eval output
ckb eval --verbose
Related Pages
- Architecture — System design overview
- Performance — Query latency and caching
- API-Reference — Search and query tools
- Practical-Limits — Accuracy and limitations