Satori Architecture

Two packages work in concert: packages/core is the trust layer (merkle/hash-on-change sync, partial-scan safety, path-normalization SSOT). packages/mcp is the agent interface (six-tool contract, freshness gating, search orchestration, sidecar lifecycle, snapshot/fingerprint gates). Zilliz Cloud (Milvus) and a call-graph sidecar provide the retrieval substrate.

Six MCP tools • Sync-on-read for search • Fingerprint gates • Merkle incremental sync • Call-graph sidecar
Six tools = six eyes into your repo: search → outline → graph → read.

System Overview

Agent requests route through a fixed six-tool registry. The MCP runtime enforces freshness/fingerprint gates and orchestrates search. Core sync provides deterministic file-state truth for indexing.

Agent Claude / Codex / Cursor Tool Registry (6 MCP Tools) MCP Runtime ToolHandlers & Orchestration Freshness Gate Snapshot + Fingerprint Gates Vector Store Zilliz / Milvus Call-Graph Sidecar AST graph artifacts Core Sync File Sync / Merkle / Hash

Runtime Boundaries

MCP Runtime (@zokizuan/satori-mcp)

  • Owns the six-tool MCP contract and schema validation.
  • Runs freshness gating: sync-on-read is executed for search_codebase.
  • Applies snapshot/fingerprint compatibility gates with deterministic requires_reindex envelopes.
  • Schedules background periodic sync and optional debounced watcher sync.
  • Reconciles .satoriignore/.gitignore edits without full reindex.
  • Reconciles cloud snapshot metadata before handler execution to keep local state aligned.
  • Manages sidecar lifecycle and fallback behavior when graph data is unavailable.

Core Engine (@zokizuan/satori-core)

  • Maintains the sync trust layer: stat-first, hash-on-change, deterministic merkle root.
  • Preserves prior entries under partial scans (unreadable files and directories).
  • Enforces path normalization SSOT for scan/snapshot/diff flows.
  • Re-embeds only changed files for incremental indexing performance.
  • Performs AST-aware chunking/splitting and remains MCP-agnostic.

Tool Surface

Satori intentionally exposes only six MCP tools to reduce routing ambiguity while keeping call-chain traversal and file navigation first-class.

Tool Purpose Key Params Notes
list_codebases Inventory tracked roots and indexing state. None Deterministic bucket order with lexicographically sorted paths per bucket.
manage_index Index lifecycle and recovery operations. action, path, optional force, customExtensions, ignorePatterns, zillizDropCollection Actions: create|reindex|sync|status|clear. No public splitter knob.
search_codebase Runtime-first semantic search with grouped/raw output. query, scope, resultMode, groupBy, rankingMode, limit, optional debug Defaults: scope=runtime,resultMode=grouped,groupBy=symbol,rankingMode=auto_changed_first. Includes freshnessDecision and grouped callGraphHint.
file_outline Sidecar-backed symbol outline and exact resolver. path, file, start_line, end_line, limitSymbols, resolveMode, symbolIdExact, symbolLabelExact Exact mode returns deterministic statuses: ok|ambiguous|not_found; includes hasMore.
call_graph Traverse callers/callees from a symbolRef. path, symbolRef, direction, depth, limit Uses prebuilt sidecar; deterministic node/edge ordering. Alias bidirectional is normalized to both.
read_file Safe file reads with optional line ranges and annotated mode. path, start_line, end_line, mode, open_symbol mode=annotated returns content + outlineStatus. Large reads are truncated by READ_FILE_MAX_LINES.

State and Safety

Snapshot state controls searchable access. Fingerprint mismatches are hard-gated into deterministic requires_reindex outcomes until explicit recovery.

manage_index(create) success failure manage_index(sync) auto-sync fingerprint mismatch manage_index(reindex) manage_index(clear) from any state not_indexed indexing ready failed sync_completed requires_reindex

Fingerprint Fields

{ embeddingProvider, embeddingModel, embeddingDimension, vectorStoreProvider, schemaVersion }

Safety Result

Prevents silent reads from incompatible embedding models/dimensions or schema-mode mismatches.

Search And Sync Behavior

Search Pipeline
freshness gate (search_codebase sync-on-read, ~3m staleness window)
  -> operator parse (lang:, path:, -path:, must:, exclude:)
       path operators use gitignore-style matching over normalized relative paths
  -> bounded candidate retrieval (SEARCH_MAX_CANDIDATES)
  -> deterministic filter precedence:
       scope -> lang -> path include -> path exclude -> must -> exclude
  -> rerank policy (docs scope skipped by default; graceful degradation on failure)
  -> changed-files boost (git status cache TTL ~5s, max changed-files threshold 50)
  -> grouping + diversity caps (deterministic relaxed pass)
  -> tie-break chain:
       score desc -> file asc -> start_line asc -> symbol label asc -> symbol id asc
  -> noise hint emission when visible top-K crosses noise threshold
Ignore Model
ignore sources:
  .gitignore + .satoriignore + configured ignore patterns

reconcile flow:
  control-file signature change detected
    -> delete newly ignored indexed paths
    -> reload synchronizer
    -> forced incremental sync
    -> update signature/version
    -> coalesce concurrent reconcile requests

Non-watcher convergence is still guaranteed via signature checks during ensureFreshness.

Capability Resolver + Rerank
capability profile (fast / standard / slow)
  -> default and max search limits
  -> rerank policy decision
  -> docs scope skips rerank by policy in current public surface
  -> no public useReranker override knob

If rerank is attempted and fails, responses degrade with RERANKER_FAILED warnings and deterministic ordering is preserved.

Incremental Sync
stat-first scan (size + mtime signature)
  -> reuse prior hash when unchanged
  -> hash bytes only for changed/new candidates
  -> deterministic merkleRoot from sorted (relativePath, hash)
  -> diff {added, removed, modified}
  -> apply vector updates incrementally

Partial scans preserve prior entries for unreadable files/dirs to avoid false removals; true removals under unreadable prefixes are delayed until readability returns.

Transcript-Style Agent Calls

Representative MCP calls and output shape from real usage patterns.

Called satori.list_codebases({})
Called satori.manage_index({"action":"status","path":"/home/hamza/repo/tradingview_ratio"})
-> sync completed | +0 added, -0 removed, ~0 modified

Called satori.search_codebase({"path":"/home/hamza/repo/tradingview_ratio","query":"auth refresh token flow","scope":"runtime","resultMode":"grouped","groupBy":"symbol","limit":5})
-> Hit: src/auth/session.ts:88-141
-> callGraphHint.symbolRef: {"file":"src/auth/session.ts","symbolId":"sym_auth_refresh"}

Called satori.call_graph({"path":"/home/hamza/repo/tradingview_ratio","symbolRef":{"file":"src/auth/session.ts","symbolId":"sym_auth_refresh"},"direction":"both","depth":2,"limit":20})
-> nodes: 6, edges: 8, notes: 1

Called satori.file_outline({"path":"/home/hamza/repo/tradingview_ratio","file":"src/auth/session.ts","limitSymbols":50})
-> 12 symbols resolved | spans locked

Called satori.read_file({"path":"src/auth/session.ts","open_symbol":"refreshToken","mode":"annotated"})
-> lines 88-141 | outlineStatus: ok | no warnings

File not found recovery:
Called satori.search_codebase({"path":"/home/hamza/repo/tradingview_ratio","query":"INVARIANTS.md canonical path","scope":"docs","resultMode":"raw","limit":10})
Called satori.read_file({"path":"/home/hamza/repo/tradingview_ratio/docs/INVARIANTS.md","start_line":480,"end_line":620,"mode":"annotated"})