CKB Performance

Performance characteristics and benchmarks for CKB tools.

Latency Targets

CKB tools are classified by performance budget:

Budget	P95 Target	Tools
Cheap	< 300ms	`searchSymbols`, `explainFile`, `listEntrypoints`, `explainPath`, `getSymbol`, `explainSymbol`
Heavy	< 2000ms	`traceUsage`, `getArchitecture`, `getHotspots`, `summarizeDiff`, `recentlyRelevant`, `listKeyConcepts`, `analyzeImpact`, `getCallGraph`, `findReferences`, `justifySymbol`

v6.0 Architectural Memory Tools

Budget	P95 Target	Tools
Cheap	< 300ms	`getModuleResponsibilities`, `getOwnership`, `recordDecision`, `getDecisions`, `annotateModule`
Heavy	< 2000ms	`getArchitecture`, `getHotspots`
Heavy	< 30000ms	`refreshArchitecture`

Benchmark Results

Environment: Apple M4 Pro, Go 1.23, macOS

v7.4 Tool Discovery Token Optimization

Presets|Tool presets reduce the token cost of MCP tools/list by up to 83%:

Preset	Tools	Bytes	Tokens	vs Full
`core` (default)	14	6,127	~1,531	-83%
`review`	19	9,177	~2,294	-75%
`refactor`	19	8,864	~2,216	-75%
`docs`	20	8,375	~2,093	-77%
`ops`	25	9,464	~2,366	-74%
`federation`	28	12,488	~3,122	-65%
`full`	76	36,172	~9,043	baseline

Before v7.4: Every session loaded all 76 tools (~9,000 tokens) before any work could begin.

After v7.4: Default core preset loads 14 tools (~1,500 tokens). AI can expand dynamically with expandToolset if needed.

Token estimate: bytes / 4 (conservative for structured JSON)

v7.4 SCIP Backend Optimizations

The SCIP backend uses pre-computed indexes for dramatically faster lookups:

Operation	Before	After	Improvement
FindReferences	340μs	2.5μs	136x
SearchSymbols	930μs	136μs	7x
FindSymbolLocation	70μs	28ns	2,500x
GetCachedSymbol	210ns	7.5ns	28x

Implementation Details:

Index	Purpose	Complexity
`RefIndex`	Inverted index: symbolId → occurrences	O(1) lookup vs O(n×m) scan
`ConvertedSymbols`	Pre-converted SCIPSymbol cache	Avoids repeated SCIP parsing
`ContainerIndex`	Maps occurrence positions to containing symbols	O(1) lookup vs O(n²) scan
`findSymbolLocationFast`	Definition lookup via RefIndex	O(k) where k = occurrences

These indexes are built during SCIP index load with minimal memory overhead (~20-30% increase).

v7.4 Git Backend Optimizations

The getHotspots tool was dramatically optimized by consolidating git commands:

Operation	Before	After	Improvement
getHotspots (20 files)	26.7s	498ms	53x

Problem: For each changed file, the old code ran 4 separate git commands:

git rev-list --count (commit count)
git shortlog -sn (authors)
git log (last modified)
git log --numstat (line changes)

With 100+ files changed in 30 days = 400+ process spawns.

Solution: Single git log --format=%H|%an|%aI --numstat command parses all data in one pass.

Helper Function Performance

In-memory processing functions complete in nanoseconds to microseconds:

Function	Time	Description
`classifyFileRiskLevel`	1.0 ns	Risk classification for diff files
`classifyHotspotRisk`	0.77 ns	Churn-based risk assessment
`computeDiffConfidence`	2.2 ns	Confidence calculation
`computePathConfidence`	1.0 ns	Path confidence from basis
`detectLanguage`	7.3 ns	Language from file extension
`suggestTestPath`	19 ns	Test file path generation
`titleCase`	29 ns	Simple title casing
`classifyRecency`	43 ns	Timestamp recency classification
`computeRecencyScore`	44 ns	Recency scoring
`classifyFileRole`	78 ns	File role from path patterns
`splitCamelCase`	116 ns	CamelCase word splitting
`classifyPathRole`	297 ns	Full path role classification
`categorizeConceptV52`	561 ns	Concept categorization
`buildDiffSummary`	674 ns	Diff summary text generation
`extractConcept`	903 ns	Concept extraction from names

Pipeline Performance

Simulated tool processing with multiple items:

Pipeline	Items	Time	Budget	Headroom
PathClassification	10 paths	3.0 µs	300ms	99.999%
DiffProcessing	50 files	8.9 µs	2000ms	99.999%
HotspotProcessing	50 items	10.1 µs	2000ms	99.999%
ConceptExtraction	10 names	14.9 µs	2000ms	99.999%

v6.0 Hotspot Benchmarks

Function	Time	Description
`CalculateInstability`	0.25 ns	Martin's instability metric
`ComputeCompositeScore`	0.26 ns	Weighted hotspot score
`NormalizeChurnScore`	0.25 ns	Churn normalization
`NormalizeCouplingScore`	0.25 ns	Coupling normalization
`NormalizeComplexityScore`	0.26 ns	Complexity normalization
`CalculateTrend`	295 ns	Trend analysis (30 snapshots)

Pipeline	Items	Time	Budget	Headroom
HotspotScoring	100 files	69 ns	2000ms	99.999%
TrendAnalysis	50 files × 10 snapshots	5.7 µs	2000ms	99.999%

v6.0 Ownership Benchmarks

Function	Time	Description
`normalizeAuthorKey`	10 ns	Author key normalization
`BlameOwnershipToOwners`	47 ns	Convert blame to owners
`CodeownersToOwners`	56 ns	Convert CODEOWNERS to owners
`isBot`	743 ns	Bot detection (regex)
`matchPattern`	1.9 µs	Glob pattern matching
`GetOwnersForPath`	51 µs	Resolve owners for path

Pipeline	Items	Time	Budget	Headroom
OwnershipResolution	100 files × 50 rules	9.2 ms	300ms	96.9%

v7.3 Incremental Indexing

Incremental indexing (Go only) makes ckb index O(changed files) instead of O(entire repo).

Index Time Comparison

Project Size	Full Index	Incremental (1 file)	Speedup
Small (100 files)	~2s	~0.5s	4x
Medium (1000 files)	~15s	~1-2s	10x
Large (10000 files)	~60s	~2-5s	20x

Where Incremental Time Goes

Phase	Time	Notes
Change detection	~50ms	Git diff with -z flag
scip-go execution	~1-2s	Still runs full indexer
Delta extraction	~100ms	Only process changed docs
Database updates	~50ms	Delete + insert pattern

The scip-go step is unavoidable (protobuf doesn't support partial updates), but CKB only does expensive database work for changed files.

Accuracy vs Speed Trade-off

Index Type	Speed	Forward Refs	Reverse Refs
Full (`--force`)	Slower	100% accurate	100% accurate
Incremental	Faster	100% accurate	May be stale

Use ckb index --force when reverse reference accuracy is critical.

See Incremental Indexing for detailed accuracy guarantees.

Where Time Actually Goes

In-memory processing is negligible. Real-world latency is dominated by I/O:

SCIP index lookups - Symbol search, references, call graph traversal
Git history queries - Commit history, diff stats, churn metrics
File system operations - Directory traversal, file reads

This is by design - CKB's value is in orchestrating these I/O operations efficiently and compressing results for LLM consumption.

Running Benchmarks

# Run all query benchmarks
go test ./internal/query/... -bench=. -benchmem -run=^$

# Run all v6.0 benchmarks
go test ./internal/hotspots/... ./internal/ownership/... -bench=. -benchmem -run=^$

# Run specific benchmark
go test ./internal/query/... -bench=BenchmarkClassifyPathRole -benchmem -run=^$

# Run with CPU profiling
go test ./internal/query/... -bench=BenchmarkDiffProcessingPipeline -cpuprofile=cpu.prof -run=^$

Optimization Tips

For Users

Keep SCIP index fresh - Stale indexes cause fallback to slower backends
Use scoped queries - Adding scope parameter reduces search space
Set reasonable limits - Don't request 1000 results if you need 20

For Contributors

Profile before optimizing - Most time is in I/O, not CPU
Cache aggressively - CKB's three-tier cache handles this
Batch I/O operations - Fewer round trips beats faster processing

Incremental Indexing - Fast index updates for Go projects
MCP Integration - Tool documentation and usage
Architecture - How CKB processes queries
Configuration - Cache and backend settings