Skip to content

Incremental Indexing

Incremental indexing makes SCIP index updates O(changed files) instead of O(entire repo). After editing a file, the index updates in seconds instead of requiring a full reindex.

Availability: Go, TypeScript, JavaScript, Python, Dart, Rust (v7.5+). Other languages fall back to full reindexing.

v1.1 (v7.3): Adds incremental callgraph maintenance—outgoing calls from changed files are always accurate.

v2.0 (v7.3): Adds transitive invalidation—files depending on changed files can be automatically queued for rescanning.

v4.0 (v7.3): Adds CI-generated delta artifacts for O(delta) server-side ingestion.

v5.0 (v7.5): Adds multi-language support via indexer registry pattern.

Why Incremental Indexing?

Full SCIP indexing scans your entire codebase, which can take 30+ seconds for large projects. This creates friction:

  • During development: You edit one file but wait 30s for the index to update
  • In CI/CD: Every commit triggers a full reindex even if only one file changed
  • With watch mode: Frequent reindexes burn CPU and slow down your machine

Incremental indexing solves this by only processing changed files.

How It Works

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐     ┌──────────────────┐
│ Change Detection│ ──► │ SCIP Extraction  │ ──► │ Delta Application│ ──► │ Transitive       │
│ (git diff -z)   │     │ (symbols + calls)│     │ (delete+insert) │     │ Invalidation (v2)│
└─────────────────┘     └──────────────────┘     └─────────────────┘     └──────────────────┘

1. Change Detection

CKB detects changes using git:

git diff --name-status -z <last-indexed-commit> HEAD

The -z flag uses NUL separators, correctly handling paths with spaces or special characters.

Tracked change types:

  • Added - New .go files
  • Modified - Changed .go files
  • Deleted - Removed .go files
  • Renamed - Moved/renamed .go files (tracks old path for cleanup)

Fallback: For non-git repos, CKB falls back to hash-based comparison against stored file hashes.

2. SCIP Extraction

CKB runs scip-go to regenerate the full SCIP index (protobuf doesn't support partial updates), but then:

  1. Loads the index into memory
  2. Iterates documents, only processing those in the changed set
  3. Extracts symbols, references, and call edges for changed files only
  4. Resolves caller symbols (which function contains each call site)
  5. Skips unchanged documents entirely

This means even though scip-go runs on the full codebase, CKB only does the expensive database work for changed files.

Call Edge Extraction (v1.1): For each reference to a callable symbol (function/method), CKB:

  • Detects callables using symbol kind or the (). pattern in symbol IDs
  • Resolves the enclosing function as the caller
  • Stores edges with location info: (caller_file, line, column, callee_id)

3. Delta Application

For each changed file, CKB applies updates using delete+insert:

Modified file.go:
  1. DELETE FROM file_symbols WHERE file_path = 'file.go'
  2. DELETE FROM indexed_files WHERE path = 'file.go'
  3. DELETE FROM callgraph WHERE caller_file = 'file.go'  -- v1.1
  4. DELETE FROM file_deps WHERE dependent_file = 'file.go'  -- v2
  5. INSERT new symbols, file state, call edges, and dependencies

Renamed old.go → new.go:
  1. DELETE using old path (including callgraph, file_deps)
  2. INSERT using new path

This approach is simple and correct—no complex diffing logic. The caller-owned edges invariant means call edges are always deleted and rebuilt with their owning file.

4. Transitive Invalidation (v2)

When a file changes, other files that depend on it may have stale references. v2 adds transitive invalidation to track and optionally rescan these dependent files.

File Dependency Tracking:

  • CKB maintains a file_deps table: (dependent_file, defining_file)
  • When a.go references a symbol defined in b.go, CKB records a.go → b.go
  • Only internal dependencies are tracked (not stdlib/external packages)

Rescan Queue:

  • When b.go changes, files depending on it (a.go) are enqueued for rescanning
  • The queue tracks: file path, reason, BFS depth, and attempt count
  • Queue processing respects configurable budgets (max files, max time)

Usage

Default Behavior (Supported Languages)

Incremental indexing is enabled by default for supported languages:

  • Go - scip-go
  • TypeScript/JavaScript - scip-typescript
  • Python - scip-python
  • Dart - scip_dart
  • Rust - rust-analyzer
# Incremental by default for supported languages
ckb index

# Output for incremental update:
Incremental Index Complete
--------------------------
Files:   3 modified, 1 added, 0 deleted
Symbols: 15 added, 8 removed
Refs:    42 updated
Calls:   127 edges updated
Time:    1.2s
Commit:  abc1234 (+dirty)
Pending: 5 files queued for rescan

Accuracy:
  OK  Go to definition     - accurate
  OK  Find refs (forward)  - accurate
  !!  Find refs (reverse)  - may be stale
  OK  Callees (outgoing)   - accurate
  !!  Callers (incoming)   - may be stale

Run 'ckb index --force' for full accuracy (47 files since last full)

Force Full Reindex

# Full reindex (ignores incremental)
ckb index --force

Use --force when:

  • You need 100% accurate reverse references
  • You need accurate caller information (who calls a function)
  • After major refactoring across many files
  • When incremental reports issues
  • To clear the rescan queue and start fresh

Transitive Invalidation Modes (v2)

CKB supports four invalidation modes:

Mode Behavior
none Disabled—no dependency tracking or invalidation
lazy Enqueue dependents, drain on next full reindex (default)
eager Enqueue and drain immediately (with budgets)
deferred Enqueue and drain periodically in background

Lazy Mode (Default)

In lazy mode, dependent files are queued but not immediately rescanned:

  • Low overhead during incremental indexing
  • Queue drains automatically on next ckb index --force
  • Best for development workflows where occasional staleness is acceptable

Eager Mode

In eager mode, CKB rescans dependent files immediately:

  • Higher accuracy after incremental updates
  • Respects budget limits to prevent runaway processing
  • Best when accuracy is critical

Configuration

{
  "incremental": {
    "threshold": 50,
    "indexTests": false,
    "excludes": ["vendor", "testdata"]
  },
  "transitive": {
    "enabled": true,
    "mode": "lazy",
    "depth": 1,
    "maxRescanFiles": 200,
    "maxRescanMs": 1500
  }
}
Setting Default Description
enabled true Enable transitive invalidation
mode lazy Invalidation mode: none, lazy, eager, deferred
depth 1 BFS cascade depth (1 = direct dependents only)
maxRescanFiles 200 Max files to rescan per drain run
maxRescanMs 1500 Max time (ms) per drain run (0 = unlimited)

Accuracy Guarantees

Incremental indexing maintains forward accuracy but may have stale reverse references. With v1.1, call graph accuracy is improved: outgoing calls (callees) are always accurate. With v2 in eager mode with queue drained, all queries are accurate.

Query Type After Incremental After Queue Drained
Go to definition Always accurate Always accurate
Find refs FROM changed files Always accurate Always accurate
Find refs TO symbols in changed files May be stale Accurate
Call graph (callees) Always accurate Always accurate
Call graph (callers) May be stale Accurate
Symbol search Always accurate Always accurate

Why Reverse References May Be Stale

Consider this scenario:

// utils.go (unchanged)
func Helper() { ... }

// main.go (changed - removed call to Helper)
func main() {
    // Helper()  <- removed this line
}

After incremental indexing:

  • main.go is re-indexed correctly (no longer references Helper)
  • utils.go is NOT re-indexed (unchanged)
  • CKB's stored references still show main.goHelper from utils.go's perspective

This is the "caller-owned edges" invariant: references are owned by the FROM file, not the TO file.

Impact: When you ask "what calls Helper?", CKB might still show the deleted call from main.go until you run ckb index --force.

With v2 eager mode: If you change helper.go, files that depend on it are automatically rescanned, keeping reverse references accurate.

Index State Tracking

CKB tracks index state in the database:

Index State:
  State: partial (3 files since last full)
  Commit: abc1234
  Dirty: yes (uncommitted changes)
  Pending: 5 files queued for rescan

States:

  • full - Complete reindex, all references accurate, queue empty
  • partial - Incremental updates applied, reverse refs may be stale
  • pending - Work queued in rescan queue (v2)
  • full_dirty / partial_dirty - Uncommitted changes detected

When Full Reindex Is Required

CKB automatically triggers a full reindex when:

Condition Reason
No previous index Nothing to diff against
Schema version mismatch Database structure changed
No tracked commit Can't compute git diff
>50% files changed Incremental overhead exceeds full reindex

You'll see messages like:

Full reindex required: schema version mismatch (have 7, need 8)

Performance Characteristics

Scenario Full Index Incremental
Small project (100 files) ~2s ~0.5s
Medium project (1000 files) ~15s ~1-2s
Large project (10000 files) ~60s ~2-5s
Single file change ~60s ~1s

The key insight: incremental time is proportional to changed files, not total files.

Transitive invalidation overhead (v2):

  • Lazy mode: negligible (~1ms to enqueue dependents)
  • Eager mode: depends on cascade size and budgets

Limitations

Current limitations:

  1. Some languages unsupported - Java, Kotlin, C++, Ruby, C#, PHP always do full reindex (build complexity)
  2. Reverse refs may be stale in lazy mode - Use eager mode or --force when accuracy is critical
  3. Callers may be stale - Incoming calls to changed symbols may be outdated until queue drains
  4. No partial SCIP - Still runs full indexer, just processes less output
  5. External deps not tracked - Only internal file dependencies are tracked
  6. Indexer must be installed - Missing indexers fall back to full reindex with install hint

Troubleshooting

"Full reindex required" every time

Check that:

  1. You're in a git repository
  2. The previous index completed successfully
  3. Schema version matches (may need --force after CKB upgrade)

Incremental seems slow

If incremental takes as long as full reindex:

  1. Check how many files changed (git status)
  2. If >50% changed, CKB falls back to full automatically
  3. Large individual files still take time to process

Stale references causing issues

If you're seeing phantom references:

# Force full reindex (also clears rescan queue)
ckb index --force

This rebuilds all references from scratch.

Too many pending rescans

If the rescan queue grows large:

# Check queue status
ckb status

# Force full reindex to clear queue
ckb index --force

Or increase budgets in configuration to process more files per run.

Delta Artifacts (v4)

Delta artifacts enable O(delta) server-side ingestion by pre-computing the diff in CI. Instead of the server comparing databases, CI generates a manifest of exactly what changed.

Why Delta Artifacts?

Traditional incremental indexing computes diffs by comparing the staging DB to the current DB—O(N) over all symbols/refs/calls. For repos with 500k+ symbols, this becomes a bottleneck.

Delta artifacts solve this by having CI emit the diff alongside the index:

┌──────────┐     ┌──────────────┐     ┌─────────────┐
│ CI Build │ ──► │ ckb diff     │ ──► │ delta.json  │
│ (scip)   │     │ (compare DBs)│     │ (manifest)  │
└──────────┘     └──────────────┘     └─────────────┘
                                             │
                                             ▼
┌──────────────┐     ┌─────────────────┐     ┌──────────────┐
│ CKB Server   │ ◄── │ POST /delta     │ ◄── │ CI Upload    │
│ (apply delta)│     │ /ingest         │     │ (artifact)   │
└──────────────┘     └─────────────────┘     └──────────────┘

Generating Delta Artifacts

Use ckb diff to generate a delta manifest:

# Compare two snapshot databases
ckb diff \
  --base /path/to/old-snapshot.db \
  --new /path/to/new-snapshot.db \
  --output delta.json

# Output: delta.json with changes

Delta JSON Schema

{
  "delta_schema_version": 1,
  "base_snapshot_id": "sha256:abc123...",
  "new_snapshot_id": "sha256:def456...",
  "commit": "def456789",
  "timestamp": 1703260800,
  "deltas": {
    "symbols": {
      "added": ["scip-go...NewFunc()."],
      "modified": ["scip-go...ChangedFunc()."],
      "deleted": ["scip-go...RemovedFunc()."]
    },
    "refs": {
      "added": [{"pk": "f_abc:42:12:scip-go...Foo().", "data": {...}}],
      "deleted": ["f_abc:50:5:scip-go...Old()."]
    },
    "callgraph": { "added": [...], "deleted": [...] },
    "files": { "added": [...], "modified": [...], "deleted": [...] }
  },
  "stats": { "total_added": 45, "total_modified": 12, "total_deleted": 8 }
}

Ingesting Delta Artifacts

Upload delta artifacts to CKB server via the API:

# Validate delta without applying
curl -X POST http://localhost:8080/delta/validate \
  -H "Content-Type: application/json" \
  -d @delta.json

# Ingest delta artifact
curl -X POST http://localhost:8080/delta/ingest \
  -H "Content-Type: application/json" \
  -d @delta.json

Server Validation

Before applying a delta, the server validates:

  1. Schema version - delta_schema_version must be supported
  2. Base snapshot - base_snapshot_id must match current active snapshot
  3. Counts - Entity counts must match stats
  4. Hashes - Spot-check hashes for modified entities
  5. Integrity - Foreign key relationships must be valid

If validation fails, the server rejects the delta and requires a full snapshot.

Configuration

{
  "ingestion": {
    "deltaArtifacts": true,
    "deltaValidation": "strict",
    "fallbackToStagingDiff": true
  }
}
Setting Default Description
deltaArtifacts true Enable delta artifact ingestion
deltaValidation strict Validation mode: strict or permissive
fallbackToStagingDiff true Fall back to staging diff if delta fails

CI Integration Example (GitHub Actions)

name: Index and Upload Delta

on:
  push:
    branches: [main]

jobs:
  index:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Download previous snapshot
        uses: actions/download-artifact@v4
        with:
          name: ckb-snapshot
          path: .ckb/
        continue-on-error: true

      - name: Run SCIP indexer
        run: ckb index

      - name: Generate delta
        run: |
          if [ -f .ckb/previous.db ]; then
            ckb diff --base .ckb/previous.db --new .ckb/ckb.db --output delta.json
          fi

      - name: Upload delta to CKB server
        if: hashFiles('delta.json') != ''
        run: |
          curl -X POST ${{ secrets.CKB_SERVER_URL }}/delta/ingest \
            -H "Authorization: Bearer ${{ secrets.CKB_TOKEN }}" \
            -H "Content-Type: application/json" \
            -d @delta.json

      - name: Save snapshot for next run
        run: cp .ckb/ckb.db .ckb/previous.db

      - uses: actions/upload-artifact@v4
        with:
          name: ckb-snapshot
          path: .ckb/previous.db
          retention-days: 7

Performance Impact

Repo Size Traditional Diff Delta Artifact
10k symbols 50ms 5ms
100k symbols 500ms 10ms
500k symbols 5s 20ms

Delta artifacts shift the diff computation to CI (where it runs once) instead of the server (where it would run on every request).