Skip to content

CKB Architecture

Overview

CKB (Code Knowledge Backend) is designed as a layered system that abstracts multiple code intelligence backends behind a unified query interface. The architecture has evolved through several versions:

  • v6.0 — Architectural Memory (persistent knowledge)
  • v6.1 — Background Jobs & CI/CD Integration
  • v6.2 — Federation (cross-repo queries)
  • v6.2.1 — Daemon Mode (always-on service)
  • v6.2.2 — Tree-sitter Complexity
  • v6.3 — Contract-Aware Impact Analysis
  • v6.4 — Runtime Telemetry
  • v6.5 — Developer Intelligence
  • v7.0 — Zero-Friction UX (npm distribution, auto-setup)
  • v7.1 — Tree-sitter Symbol Fallback (no-index operation)
  • v7.2 — Multi-Tool Setup & Analysis Tiers
  • v7.3 — Doc-Symbol Linking, Incremental Indexing & Callgraph
┌─────────────────────────────────────────────────────────┐
│                    Interfaces                            │
│  ┌─────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │   CLI   │  │  HTTP API   │  │     MCP Server      │  │
│  └────┬────┘  └──────┬──────┘  └──────────┬──────────┘  │
└───────┼──────────────┼────────────────────┼─────────────┘
        │              │                    │
        └──────────────┼────────────────────┘
                       ▼
┌─────────────────────────────────────────────────────────┐
│                   Query Engine                           │
│  ┌────────────┐  ┌────────────┐  ┌────────────────────┐ │
│  │   Router   │  │  Merger    │  │    Compressor      │ │
│  └────────────┘  └────────────┘  └────────────────────┘ │
└─────────────────────────┬───────────────────────────────┘
                          │
┌─────────────────────────┼───────────────────────────────┐
│              Architectural Memory (v6.0)                 │
│  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌──────┐ │
│  │  Modules  │  │ Ownership │  │ Hotspots  │  │ ADRs │ │
│  │  Registry │  │  Registry │  │  Tracker  │  │      │ │
│  └───────────┘  └───────────┘  └───────────┘  └──────┘ │
└─────────────────────────┬───────────────────────────────┘
                          │
┌─────────────────────────┼───────────────────────────────┐
│            Cross-Cutting Concerns (v6.1-v6.4)            │
│  ┌─────────┐ ┌──────────┐ ┌───────────┐ ┌────────────┐ │
│  │  Jobs   │ │Federation│ │ Complexity│ │ Telemetry  │ │
│  │ (v6.1)  │ │  (v6.2)  │ │  (v6.2.2) │ │   (v6.4)   │ │
│  └─────────┘ └──────────┘ └───────────┘ └────────────┘ │
│  ┌─────────┐ ┌──────────┐ ┌───────────────────────────┐ │
│  │ Daemon  │ │Contracts │ │      (v6.2.1 Services)    │ │
│  │ (v6.2.1)│ │  (v6.3)  │ │ Scheduler│Watcher│Webhooks│ │
│  └─────────┘ └──────────┘ └───────────────────────────┘ │
└─────────────────────────┬───────────────────────────────┘
                          │
┌─────────────────────────┼───────────────────────────────┐
│                   Backend Layer                          │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌───────────┐  │
│  │  SCIP   │  │   LSP   │  │   Git   │  │Tree-sitter│  │
│  └─────────┘  └─────────┘  └─────────┘  └───────────┘  │
└─────────────────────────┬───────────────────────────────┘
                          │
┌─────────────────────────┼───────────────────────────────┐
│                   Storage Layer                          │
│  ┌────────────────┐  ┌────────────────────────────────┐ │
│  │    SQLite      │  │         Cache Tiers            │ │
│  │  (Symbols,     │  │  Query │ View │ Negative       │ │
│  │   Ownership,   │  │  Cache │ Cache│ Cache          │ │
│  │   Decisions,   │  │                                │ │
│  │   Telemetry)   │  │                                │ │
│  └────────────────┘  └────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

Core Components

1. Interface Layer

CLI (cmd/ckb/)

  • Cobra-based command structure
  • Human-readable output
  • Interactive commands

HTTP API (internal/api/)

  • REST endpoints
  • JSON responses
  • OpenAPI specification
  • Middleware (logging, CORS, recovery)

MCP Server (internal/mcp/)

  • Model Context Protocol implementation
  • Tool definitions for AI assistants
  • Streaming support

2. Query Engine

Router

Routes queries to appropriate backends based on:

  • Query type (definition, references, search)
  • Backend availability
  • Query policy configuration

Merger

Combines results from multiple backends:

  • prefer-first: Use first successful response
  • union: Merge all responses, deduplicate

Important: Backend Query Behavior

The orchestrator selects backends upfront and queries them in parallel:

  • prefer-first mode: Only the primary backend (highest priority available) is selected and queried. If it fails, the query fails—there is no sequential fallback to secondary backends.
  • union mode: All available backends in the preference order are queried in parallel. If any backend succeeds, the query succeeds. This provides natural resilience since a failure in one backend doesn't affect others.

This means for maximum resilience, use union mode. The prefer-first mode is optimized for speed (single backend query) at the cost of resilience.

Compressor (internal/compression/)

Optimizes responses for LLM consumption:

  • Enforces response budgets
  • Truncates with drilldown suggestions
  • Deduplicates results

3. Backend Layer

SCIP Backend

  • Reads pre-computed SCIP indexes
  • Fastest and most accurate
  • Requires index generation
  • v7.4 Optimizations:
    • RefIndex: Inverted index for O(1) reference lookups
    • ConvertedSymbols: Pre-converted symbol cache
    • findSymbolLocationFast: O(k) definition lookup

LSP Backend

  • Communicates with language servers
  • Real-time analysis
  • May require workspace initialization

Git Backend

  • Fallback for basic operations
  • File listing, blame, history
  • Always available in git repos

4. Architectural Memory Layer (v6.0)

v6.0 introduces persistent architectural knowledge that survives across sessions.

Module Registry (internal/modules/)

  • Tracks module boundaries from MODULES.toml or inference
  • Stores responsibilities, ownership, and tags
  • Supports declared (explicit) and inferred (automatic) modules

Ownership Registry (internal/ownership/)

  • Parses CODEOWNERS files (confidence: 1.0)
  • Computes git-blame ownership (confidence: 0.79)
  • Tracks ownership history over time
  • Merges sources with priority: CODEOWNERS > blame > heuristic

Hotspot Tracker

  • Stores historical hotspot snapshots (append-only)
  • Computes trends (increasing/stable/decreasing)
  • Projects future scores based on velocity

Decision Log (internal/decisions/)

  • Parses ADR markdown files
  • Indexes decisions for search
  • Links decisions to affected modules

5. Storage Layer

SQLite Database (.ckb/ckb.db)

Core Tables (v5.x):

  • symbol_mappings - Stable ID to backend ID mappings
  • symbol_aliases - Redirect mappings for renamed symbols
  • modules - Detected modules cache
  • dependency_edges - Module dependency graph

Architectural Memory Tables (v6.0):

  • ownership - Ownership rules with source and confidence
  • ownership_history - Ownership changes over time (append-only)
  • hotspot_snapshots - Historical hotspot metrics (append-only)
  • responsibilities - Module/file responsibility descriptions
  • decisions - ADR metadata (content in markdown files)
  • module_renames - Tracks module ID changes across renames

Full-Text Search:

  • decisions_fts - FTS5 index for decision search
  • responsibilities_fts - FTS5 index for responsibility search

Cache Tiers

Tier TTL Key Contains Use Case
Query Cache 5 min headCommit Frequent queries
View Cache 1 hour repoStateId Expensive computations
Negative Cache 5-60s repoStateId Avoid repeated failures

Persistence Model (v6.0)

~/.ckb/
├── config.toml              # global config
└── repos/
    └── <repo-hash>/
        ├── ckb.db            # unified SQLite database
        ├── decisions/        # ADR markdown files (canonical)
        │   ├── ADR-001-*.md
        │   └── ...
        └── index.scip        # SCIP index

Data Classification:

Data Type Classification Rebuild Behavior
Declared modules Canonical Preserved
Inferred modules Derived Regenerated
CODEOWNERS rules Canonical Reparsed from file
Git-blame ownership Derived Regenerated
Hotspot snapshots Derived (append-only) Kept; new appended
ADR files Canonical Never rebuilt
ADR index Derived Regenerated from files

Key Subsystems

Identity System (internal/identity/)

Provides stable symbol identification across refactors.

┌─────────────────────────────────────────┐
│           Symbol Identity               │
│                                         │
│  Stable ID: ckb:repo:sym:<fingerprint>  │
│                                         │
│  Fingerprint = hash(                    │
│    container + name + kind + signature  │
│  )                                      │
└─────────────────────────────────────────┘

Alias Resolution:

Old ID ──alias──> New ID ──alias──> Current ID
         │                 │
         └── max depth: 3 ─┘

Impact Analysis (internal/impact/)

Analyzes the blast radius of code changes.

┌─────────────────────────────────────────┐
│           Impact Analysis               │
│                                         │
│  1. Derive Visibility                   │
│     - SCIP modifiers (0.95 confidence)  │
│     - Reference patterns (0.7-0.9)      │
│     - Naming conventions (0.5-0.7)      │
│                                         │
│  2. Classify References                 │
│     - direct-caller                     │
│     - transitive-caller                 │
│     - type-dependency                   │
│     - test-dependency                   │
│                                         │
│  3. Compute Risk Score                  │
│     - Visibility (30%)                  │
│     - Direct callers (35%)              │
│     - Module spread (25%)               │
│     - Impact kind (10%)                 │
└─────────────────────────────────────────┘

Deterministic Output (internal/output/)

Ensures identical queries produce identical bytes.

Guarantees:

  • Stable key ordering (alphabetical)
  • Float precision (6 decimals)
  • Consistent sorting (multi-field, stable)
  • Nil/empty field omission

Ownership Algorithm (v6.0)

Computes code ownership from git blame with time decay.

┌─────────────────────────────────────────┐
│           Ownership Algorithm           │
│                                         │
│  1. Run git blame on file               │
│  2. Filter out bots + merge commits     │
│  3. Apply time decay:                   │
│     weight = 0.5 ^ (age / 90 days)      │
│  4. Normalize weights to 0-1            │
│  5. Assign scope:                       │
│     >= 50% → maintainer                 │
│     >= 20% → reviewer                   │
│     >= 5%  → contributor                │
└─────────────────────────────────────────┘

Source Priority:

  1. CODEOWNERS file (confidence: 1.0)
  2. Git blame (confidence: 0.79)
  3. Heuristics (confidence: 0.59)

Staleness Model (v6.0)

Architectural data can become stale:

Staleness Condition Action
fresh < 7 days, < 50 commits Use as-is
aging 7-30 days or 50-200 commits Use with warning
stale 30-90 days or 200-500 commits Suggest refresh
obsolete > 90 days or > 500 commits Require refresh

Repository State (internal/repostate/)

Tracks repository state for cache invalidation.

RepoStateID = hash(
  headCommit +
  stagedDiffHash +
  workingTreeDiffHash +
  untrackedListHash
)

Data Flow

Query Flow

1. Request arrives (CLI/HTTP/MCP)
           │
           ▼
2. Parse parameters, validate
           │
           ▼
3. Check cache (query/view/negative)
           │
      ┌────┴────┐
      │ cached? │
      └────┬────┘
           │
     yes ──┴── no
      │        │
      ▼        ▼
4. Return   5. Route to backends
   cached      │
              ┌┴┐
              │ │ (parallel or sequential)
              └┬┘
               │
               ▼
6. Merge results
               │
               ▼
7. Compress (apply budget)
               │
               ▼
8. Generate drilldowns
               │
               ▼
9. Cache result
               │
               ▼
10. Return response

Symbol Resolution Flow

1. Receive symbol ID
         │
         ▼
2. Check if alias exists
         │
    ┌────┴────┐
    │ alias?  │
    └────┬────┘
         │
   yes ──┴── no
    │        │
    ▼        │
3. Follow   │
   chain    │
   (max 3)  │
    │        │
    └────┬───┘
         │
         ▼
4. Return resolved symbol
   (with redirect info if aliased)

Configuration

Query Policy

{
  "queryPolicy": {
    "backendLadder": ["scip", "lsp", "git"],
    "mergeStrategy": "prefer-first"
  }
}

Response Budget

{
  "budget": {
    "maxModules": 10,
    "maxSymbolsPerModule": 5,
    "maxImpactItems": 20,
    "maxDrilldowns": 5,
    "estimatedMaxTokens": 4000
  }
}

Backend Limits

{
  "backendLimits": {
    "maxRefsPerQuery": 10000,
    "maxSymbolsPerSearch": 1000,
    "maxFilesScanned": 5000,
    "maxUnionModeTimeMs": 60000
  }
}

Error Handling

Error Taxonomy (internal/errors/)

All errors include:

  • Error code (machine-readable)
  • Message (human-readable)
  • Details (context-specific)
  • Suggested fixes
  • Drilldown queries

Negative Caching

Failed queries are cached to avoid repeated failures:

Error Type TTL Triggers Warmup
symbol-not-found 60s No
backend-unavailable 15s No
workspace-not-ready 10s Yes
timeout 5s No

Extension Points

Adding a New Backend

  1. Implement backend interface in internal/backends/
  2. Register in backend factory
  3. Add to configuration schema
  4. Update backend ladder options

Adding a New Tool

  1. Add handler in internal/api/handlers.go
  2. Register route in internal/api/routes.go
  3. Add MCP tool definition in internal/mcp/
  4. Update OpenAPI spec

Adding a New Cache Tier

  1. Add table in internal/storage/schema.go
  2. Implement cache methods in internal/storage/cache.go
  3. Define invalidation triggers

Cross-Cutting Subsystems (v6.1-v6.4)

Background Jobs (internal/jobs/) — v6.1

Async job execution for long-running operations.

┌─────────────────────────────────────────┐
│            Job Queue                     │
│                                         │
│  States: queued → running → completed   │
│                    ↓                    │
│                  failed                 │
│                                         │
│  Job Types:                             │
│  - refresh_architecture                 │
│  - analyze_impact                       │
│  - federation_sync                      │
│  - export                               │
└─────────────────────────────────────────┘

Features:

  • SQLite-backed persistence
  • Progress tracking with percentage
  • Cancellation support
  • Result storage

Federation (internal/federation/) — v6.2

Cross-repository query aggregation.

┌─────────────────────────────────────────┐
│           Federation                     │
│                                         │
│  ~/.ckb/federation/<name>/              │
│  ├── config.toml   (repo list)          │
│  └── index.db      (aggregated data)    │
│                                         │
│  Indexed Data:                          │
│  - Modules (top N per repo)             │
│  - Ownership patterns                   │
│  - Hotspots (top 20 per repo)           │
│  - Decisions (all)                      │
│  - Contracts (v6.3)                     │
└─────────────────────────────────────────┘

Staleness Model:

  • fresh: all repos synced < 24h
  • aging: some repos 1-7 days old
  • stale: some repos 7-30 days old
  • obsolete: some repos > 30 days old

Daemon Mode (internal/daemon/) — v6.2.1

Always-on background service.

┌─────────────────────────────────────────┐
│           Daemon Process                 │
│                                         │
│  ┌───────────┐  ┌───────────────────┐  │
│  │ HTTP API  │  │    Components     │  │
│  │ :9120     │  │                   │  │
│  └───────────┘  │  ┌─────────────┐  │  │
│                 │  │  Scheduler  │  │  │
│  Storage:       │  │  (cron/int) │  │  │
│  ~/.ckb/daemon/ │  ├─────────────┤  │  │
│  ├── daemon.pid │  │   Watcher   │  │  │
│  ├── daemon.log │  │  (fsnotify) │  │  │
│  └── daemon.db  │  ├─────────────┤  │  │
│                 │  │  Webhooks   │  │  │
│                 │  │  (outbound) │  │  │
│                 │  └─────────────┘  │  │
└─────────────────────────────────────────┘

Scheduler (internal/scheduler/):

  • Cron expressions: */5 * * * *
  • Interval syntax: every 30m, daily at 02:00
  • Task types: refresh, federation_sync, cleanup, health_check

Watcher (internal/watcher/):

  • Monitors .git/HEAD and .git/index
  • Debounced refresh (default 5s)
  • Configurable ignore patterns

Webhooks (internal/webhooks/):

  • Formats: JSON, Slack, PagerDuty, Discord
  • HMAC-SHA256 signing
  • Exponential backoff retry (5 attempts)
  • Dead letter queue

Tree-sitter Complexity (internal/complexity/) — v6.2.2

Language-agnostic code complexity metrics.

┌─────────────────────────────────────────┐
│        Complexity Analysis               │
│                                         │
│  Supported: Go, JS, TS, Python,         │
│             Rust, Java, Kotlin          │
│                                         │
│  Metrics:                               │
│  ┌─────────────────────────────────┐   │
│  │ Cyclomatic = Σ decision points  │   │
│  │   (if, for, while, switch,      │   │
│  │    case, &&, ||, catch, ?:)     │   │
│  ├─────────────────────────────────┤   │
│  │ Cognitive = Σ (nesting × cost)  │   │
│  │   Penalizes deep nesting        │   │
│  └─────────────────────────────────┘   │
│                                         │
│  Integration: feeds getHotspots risk   │
└─────────────────────────────────────────┘

Contract Analysis (internal/federation/contracts/) — v6.3

Cross-repo API contract tracking.

┌─────────────────────────────────────────┐
│        Contract Analysis                 │
│                                         │
│  Contract Types:                        │
│  - proto (.proto files)                 │
│  - openapi (.yaml/.json with openapi)   │
│                                         │
│  Visibility Classification:             │
│  - public: api/, proto/, versioned      │
│  - internal: internal/, testdata/       │
│  - unknown: no clear signals            │
│                                         │
│  Evidence Tiers:                        │
│  ┌────────────────────────────────┐    │
│  │ Tier 1 (declared): buf.yaml,   │    │
│  │   proto imports, *.pb.go       │    │
│  │   Confidence: 1.0              │    │
│  ├────────────────────────────────┤    │
│  │ Tier 2 (derived): type match,  │    │
│  │   package refs                 │    │
│  │   Confidence: 0.7-0.9          │    │
│  ├────────────────────────────────┤    │
│  │ Tier 3 (heuristic): naming     │    │
│  │   patterns (hidden by default) │    │
│  │   Confidence: ≤0.5             │    │
│  └────────────────────────────────┘    │
│                                         │
│  Risk Assessment:                       │
│  - Low: ≤2 consumers, internal         │
│  - Medium: 3-5 consumers               │
│  - High: >5 consumers, public, services│
└─────────────────────────────────────────┘

Runtime Telemetry (internal/telemetry/) — v6.4

Observed usage from production runtime.

┌─────────────────────────────────────────┐
│        Telemetry Integration             │
│                                         │
│  Ingest:                                │
│  ┌─────────────────────────────────┐   │
│  │ OTLP (/v1/metrics)              │   │
│  │ JSON (/api/v1/ingest/json)      │   │
│  └─────────────────────────────────┘   │
│             │                           │
│             ▼                           │
│  ┌─────────────────────────────────┐   │
│  │ Symbol Matching                 │   │
│  │                                 │   │
│  │ Exact:  file + func + line     │   │
│  │         → confidence 0.95      │   │
│  │ Strong: file + func            │   │
│  │         → confidence 0.85      │   │
│  │ Weak:   namespace + func       │   │
│  │         → confidence 0.60      │   │
│  └─────────────────────────────────┘   │
│             │                           │
│             ▼                           │
│  ┌─────────────────────────────────┐   │
│  │ Storage (SQLite)                │   │
│  │ - observed_usage table          │   │
│  │ - Weekly/monthly buckets        │   │
│  │ - 365-day retention             │   │
│  └─────────────────────────────────┘   │
│             │                           │
│             ▼                           │
│  ┌─────────────────────────────────┐   │
│  │ Coverage Model                  │   │
│  │                                 │   │
│  │ attribute: % events w/ attrs   │   │
│  │ match: % matched to symbols    │   │
│  │ service: % repos w/ telemetry  │   │
│  │ overall: weighted average      │   │
│  │                                 │   │
│  │ Levels:                        │   │
│  │ high (≥0.8): full features     │   │
│  │ medium (≥0.6): with warnings   │   │
│  │ low (≥0.4): usage only         │   │
│  │ insufficient (<0.4): disabled  │   │
│  └─────────────────────────────────┘   │
└─────────────────────────────────────────┘

Dead Code Detection:

  • Requires medium+ coverage
  • Only exact/strong matches
  • Confidence capped at 0.90
  • Configurable exclusions (tests, migrations, scheduled jobs)

Impact Enrichment:

  • Adds observedImpact to analyzeImpact
  • Shows observed callers not found in static analysis
  • Comparison: staticOnly vs observedOnly vs both

Hotspot Enhancement:

  • Adds observedUsage field to hotspots
  • Usage weight (0.20) in scoring formula

Zero-Friction Subsystems (v7.x)

Tree-sitter Symbol Fallback (internal/symbols/) — v7.1

Zero-index operation for code intelligence without SCIP.

┌─────────────────────────────────────────┐
│      Tree-sitter Symbol Extraction       │
│                                         │
│  Languages: Go, TS, JS, TSX, Python,    │
│             Rust, Java, Kotlin          │
│                                         │
│  Triggers when:                         │
│  - No SCIP index exists                 │
│  - SCIP query fails                     │
│  - User requests basic tier             │
│                                         │
│  Output:                                │
│  - Source: "treesitter"                 │
│  - Confidence: 0.7                      │
│  - Symbol kind, name, location          │
└─────────────────────────────────────────┘

Extracted Symbols:

  • Functions and methods
  • Classes, structs, interfaces
  • Constants and variables
  • Type definitions

Analysis Tiers (internal/tier/) — v7.2

User-controllable analysis depth for performance/accuracy tradeoffs.

┌─────────────────────────────────────────┐
│           Analysis Tiers                 │
│                                         │
│  ┌─────────────────────────────────┐   │
│  │ Fast (Basic)                    │   │
│  │ - Tree-sitter symbols only      │   │
│  │ - No SCIP required              │   │
│  │ - ~100ms latency                │   │
│  ├─────────────────────────────────┤   │
│  │ Standard (Enhanced)             │   │
│  │ - SCIP index                    │   │
│  │ - Full references               │   │
│  │ - ~500ms latency                │   │
│  ├─────────────────────────────────┤   │
│  │ Full                            │   │
│  │ - SCIP + LSP                    │   │
│  │ - Complete analysis             │   │
│  │ - ~2s latency                   │   │
│  └─────────────────────────────────┘   │
│                                         │
│  Precedence: CLI > ENV > Config > Auto  │
└─────────────────────────────────────────┘

Configuration:

  • CLI: ckb search "foo" --tier=fast
  • Env: CKB_TIER=standard
  • Config: "tier": "standard" in .ckb/config.json

Doc-Symbol Linking (internal/docs/) — v7.3

Bidirectional mapping between documentation and code symbols.

┌─────────────────────────────────────────┐
│        Doc-Symbol Linking                │
│                                         │
│  Detection Methods:                     │
│  ┌─────────────────────────────────┐   │
│  │ Backtick: `Foo.Bar`             │   │
│  │   Requires 2+ segments          │   │
│  │   Confidence: 1.0               │   │
│  ├─────────────────────────────────┤   │
│  │ Directive: <!-- ckb:symbol -->  │   │
│  │   Explicit full path            │   │
│  │   Confidence: 1.0               │   │
│  ├─────────────────────────────────┤   │
│  │ Fence: Code block identifiers   │   │
│  │   Tree-sitter extraction        │   │
│  │   Confidence: 0.7               │   │
│  └─────────────────────────────────┘   │
│                                         │
│  Resolution:                            │
│  1. Exact match → confidence 1.0        │
│  2. Suffix match → confidence 0.95      │
│  3. Ambiguous → multiple candidates     │
│  4. Missing → symbol not found          │
│                                         │
│  Staleness Detection:                   │
│  - missing_symbol: deleted              │
│  - symbol_renamed: via alias chain      │
│  - ambiguous_symbol: multiple matches   │
│  - index_incomplete: lang not indexed   │
└─────────────────────────────────────────┘

MCP Tools (6 new):

  • indexDocs — Scan and index documentation
  • getDocsForSymbol — Find docs referencing a symbol
  • getSymbolsInDoc — List symbols in a document
  • getDocsForModule — Find docs linked to a module
  • checkDocStaleness — Check for stale references
  • getDocCoverage — Documentation coverage stats

Incremental Indexing (internal/incremental/) — v7.3

O(changed files) index updates instead of O(entire repo).

┌─────────────────────────────────────────┐
│        Incremental Indexing              │
│                                         │
│  Pipeline:                              │
│  ┌─────────────────────────────────┐   │
│  │ 1. Change Detection             │   │
│  │    git diff -z <last>..HEAD     │   │
│  │    Tracks: A/M/D/R              │   │
│  ├─────────────────────────────────┤   │
│  │ 2. SCIP Extraction              │   │
│  │    Run full scip-go             │   │
│  │    Filter to changed files only │   │
│  │    Extract symbols + calls      │   │
│  ├─────────────────────────────────┤   │
│  │ 3. Delta Application            │   │
│  │    DELETE old data for file     │   │
│  │    INSERT new symbols/edges     │   │
│  │    Caller-owned edge invariant  │   │
│  ├─────────────────────────────────┤   │
│  │ 4. Transitive Invalidation (v2) │   │
│  │    Track file dependencies      │   │
│  │    Enqueue dependent files      │   │
│  │    Drain on next full/eager     │   │
│  └─────────────────────────────────┘   │
│                                         │
│  Availability: Go projects only         │
└─────────────────────────────────────────┘

Accuracy Guarantees:

Query Type After Incremental After Queue Drained
Go to definition Always accurate Always accurate
Find refs (forward) Always accurate Always accurate
Find refs (reverse) May be stale Accurate
Callees (outgoing) Always accurate Always accurate
Callers (incoming) May be stale Accurate

Transitive Invalidation Modes:

  • none — Disabled
  • lazy — Enqueue, drain on --force (default)
  • eager — Enqueue and drain immediately
  • deferred — Drain periodically in background

Automatic Fallback:

  • Falls back to full reindex when >50% files changed
  • Falls back on schema version mismatch
  • Falls back when no tracked commit exists

Database Tables (Schema v7):

  • indexed_files — Per-file state and hash
  • file_symbols — Symbols per file
  • index_meta — Index state and commit tracking
  • callgraph — Call edges with location anchors
  • file_deps — File dependency graph (v2)
  • rescan_queue — Pending transitive rescans (v2)

Multi-Tool Setup (cmd/ckb/setup.go) — v7.2

Interactive MCP configuration for 6 AI coding tools.

┌─────────────────────────────────────────┐
│           ckb setup                      │
│                                         │
│  Supported Tools:                       │
│  ┌─────────────────────────────────┐   │
│  │ Claude Code  .mcp.json          │   │
│  │ Cursor       .cursor/mcp.json   │   │
│  │ Windsurf     ~/.codeium/...     │   │
│  │ VS Code      .vscode/mcp.json   │   │
│  │ OpenCode     opencode.json      │   │
│  │ Claude Desktop platform-specific│   │
│  └─────────────────────────────────┘   │
│                                         │
│  Modes:                                 │
│  - Interactive (default)               │
│  - Direct: --tool=cursor               │
│  - Portable: --npx                     │
│  - Global: --global                    │
└─────────────────────────────────────────┘

Index Management (internal/index/) — v7.2

Smart indexing with freshness tracking and skip-if-fresh.

┌─────────────────────────────────────────┐
│        Index Management                  │
│                                         │
│  Features:                              │
│  ┌─────────────────────────────────┐   │
│  │ Language Detection              │   │
│  │   go.mod, package.json, etc.    │   │
│  ├─────────────────────────────────┤   │
│  │ Indexer Discovery               │   │
│  │   Check if scip-* installed     │   │
│  │   Show install instructions     │   │
│  ├─────────────────────────────────┤   │
│  │ Skip-if-Fresh                   │   │
│  │   Compare HEAD to indexed commit│   │
│  │   Skip if unchanged             │   │
│  ├─────────────────────────────────┤   │
│  │ Lock File                       │   │
│  │   .ckb/index.lock (flock)       │   │
│  │   Prevents concurrent indexing  │   │
│  └─────────────────────────────────┘   │
│                                         │
│  Metadata: .ckb/index-meta.json         │
│  - commitHash, fileCount, duration      │
└─────────────────────────────────────────┘

Languages Supported (v7.2):

  • Go (scip-go)
  • TypeScript/JavaScript (scip-typescript)
  • Python (scip-python)
  • Rust (rust-analyzer + scip-rust)
  • Java (scip-java)
  • Kotlin (scip-kotlin)
  • C/C++ (scip-clang)
  • Dart (scip-dart)
  • Ruby (scip-ruby)
  • C# (scip-dotnet)
  • PHP (scip-php)