Sprint Brief: Knowledge Architecture KA-3

Traversal Retrieval Path

What You’re Building

The knowledge graph built in KA-1 (schema) and KA-2 (structural assignment) is populated but passive — nothing reads from it at retrieval time. This sprint wires traversal-first retrieval into the session-start and recall paths so that the graph actively influences which memories get surfaced.

The core idea: when an entity is in scope (matched by project path, query terms, or checkpoint), walk its aspects, constraints, and one-hop dependencies to collect structurally relevant memory IDs. These join the candidate pool alongside the existing effective-score and embedding candidates.

The hard invariant: constraints always surface when their entity is in scope, regardless of score rank.

Required Reading

docs/specs/INDEX.md — Cross-Cutting Invariants (especially #5: constraints always surface)
docs/specs/complete/knowledge-architecture-schema.md — section 7 (retrieval contract)
docs/KNOWLEDGE-ARCHITECTURE.md — conceptual model
docs/specs/SPRINT-BRIEF-KA1.md — schema and helpers this depends on
docs/specs/SPRINT-BRIEF-KA2.md — structural assignment (populates the tables this sprint reads from)

Prerequisites

KA-1 and KA-2 must be complete:

entity_aspects, entity_attributes, entity_dependencies, task_meta tables exist and are indexed
KA-1 CRUD helpers in knowledge-graph.ts are working
KA-2 structural assignment pipeline is populating rows
getConstraintsForEntity() returns correct results

Current Retrieval Architecture

Understanding the existing flow is critical. Here’s how retrieval works today — KA-3 adds a new candidate source without replacing anything.

Session-start (`handleSessionStart` in hooks.ts:753)

getAllScoredCandidates(project, limit) — queries memories ordered by created_at DESC, scores each with effectiveScore(importance, createdAt, pinned) using 5%/day decay, filters effScore > 0.2
selectWithBudget(candidates, 2000) — picks top candidates within 2000 char budget
getPredictedContextMemories(project, 10, 600, excludeIds) — extracts recurring terms from recent session summaries, FTS query for supplementary memories within 600 char budget
recordSessionCandidates(sessionKey, candidates, injectedIds) — records all candidates + which were injected into session_memories
Recovery checkpoint injection — loads latest checkpoint digest within 4h window, reserved separately from main budget

Hybrid recall (`hybridRecall` in memory-search.ts:114)

BM25 keyword search (FTS5) — normalize scores to [0,1]
Vector search (sqlite-vec) — normalize cosine distances to [0,1]
Score merge — alpha * vec + (1-alpha) * bm25 blend
Rehearsal boost — log(access_count + 1) * recencyFactor
Graph boost — getGraphBoostIds() in graph-search.ts, 1-hop through relations table via memory_entity_mentions
Optional reranker — cross-encoder re-rank of top-N

Current graph boost (graph-search.ts)

The existing getGraphBoostIds() does:

Tokenize query, match entities by canonical_name LIKE %token%
One-hop expansion through relations table (both directions)
Collect memory IDs via memory_entity_mentions
Apply flat boost weight (default 0.15)

This uses the OLD graph structure (relations + memory_entity_mentions). KA-3 adds a PARALLEL traversal path through the NEW KA tables (entity_aspects, entity_attributes, entity_dependencies).

Deliverables

1. Traversal query builder

New file: packages/daemon/src/pipeline/graph-traversal.ts

This is the core of KA-3. A single function that takes focal entity IDs and returns a structurally coherent set of memory IDs plus constraint content.

export interface TraversalResult {
  /** Memory IDs collected from entity_attributes.memory_id */
  readonly memoryIds: Set<string>;
  /** Constraint content that must always be surfaced */
  readonly constraints: ReadonlyArray<{
    readonly entityName: string;
    readonly content: string;
    readonly importance: number;
  }>;
  /** Entities traversed (for telemetry) */
  readonly entityCount: number;
  /** Whether traversal hit the timeout */
  readonly timedOut: boolean;
}

export interface TraversalConfig {
  /** Max aspects per entity, ordered by weight DESC (default 10) */
  readonly maxAspectsPerEntity: number;
  /** Max attributes per aspect (default 20) */
  readonly maxAttributesPerAspect: number;
  /** Max one-hop dependency expansions (default 30) */
  readonly maxDependencyHops: number;
  /** Minimum dependency strength to traverse (default 0.3) */
  readonly minDependencyStrength: number;
  /** Timeout in ms (default 500) */
  readonly timeoutMs: number;
}

export function traverseKnowledgeGraph(
  focalEntityIds: ReadonlyArray<string>,
  db: ReadDb,
  agentId: string,
  config: TraversalConfig,
): TraversalResult;

Traversal algorithm (all synchronous, same pattern as getGraphBoostIds):

For each focal entity: a. Pull all active constraints (entity_attributes where kind='constraint' and status='active') via JOIN through entity_aspects. These go into constraints output unconditionally. b. Pull top aspects by weight DESC, limited to maxAspectsPerEntity. c. For each aspect, pull active attributes limited to maxAttributesPerAspect. Collect memory_id values (skip NULL — those are unclassified stubs from pass 1).
One-hop dependency expansion: a. Query entity_dependencies for focal entities where strength >= minDependencyStrength, limited to maxDependencyHops. b. For each dependency target entity, repeat step 1 (constraints
- top aspects + attributes). Do NOT recurse further — one hop only.
Deduplicate memory IDs across all collected attributes.
Check deadline at each major step (same Date.now() pattern as graph-search.ts).

Key design decisions:

Constraints from dependency targets are also collected (if entity X depends on entity Y, Y’s constraints matter for X’s context)
memory_id = NULL rows are skipped (awaiting KA-2 classification)
The function is pure — takes a ReadDb, no side effects
Timeout protection at each step, returns partial results on timeout

2. Focal entity resolution

New function in graph-traversal.ts:

export interface FocalEntityResult {
  readonly entityIds: string[];
  readonly source: 'project' | 'checkpoint' | 'query' | 'session_key';
}

export function resolveFocalEntities(
  db: ReadDb,
  agentId: string,
  signals: {
    project?: string;
    sessionKey?: string;
    checkpointEntityIds?: string[];
    queryTokens?: string[];
  },
): FocalEntityResult;

Resolution priority:

Checkpoint entity IDs — if the recovery checkpoint includes structural snapshot fields (KA-5 future), use those directly
Project path — match project path against entity names/ canonical names where entity_type = 'project'
Session key lineage — look up the most recent checkpoint for this session key, extract entity mentions from its digest
Query tokens — tokenize and match against entity canonical_name (same as getGraphBoostIds tokenizer)

For now, focus on project path matching (#2) and query token matching (#4). The checkpoint fields (#1) are KA-5 and session key lineage (#3) is a nice-to-have.

Project path matching:

SELECT id FROM entities
WHERE agent_id = ?
  AND entity_type = 'project'
  AND (canonical_name LIKE ? OR name LIKE ?)
ORDER BY mentions DESC
LIMIT 5

Normalize the project path: extract the last 1-2 directory segments as search tokens. /home/nicholai/signet/signetai → search for %signetai% and %signet%.

3. Wire traversal into session-start

Where: packages/daemon/src/hooks.ts, inside handleSessionStart

After getAllScoredCandidates() and before selectWithBudget():

Resolve focal entities from req.project
Call traverseKnowledgeGraph() with focal entity IDs
Merge traversal memory IDs into the candidate pool:
- For each traversal memory ID not already in allCandidates, fetch the memory row and add it with source 'ka_traversal'
- Traversal candidates get a synthetic effective score based on the attribute’s importance (not the decay-based score)
Inject constraint content as a dedicated section in the output, AFTER the “Relevant Memories” section but BEFORE recovery context

Constraint injection format:

## Active Constraints

Constraints for entities in scope. These always apply.

- [EntityName] content of constraint
- [EntityName] another constraint

Budget: Constraints get their own reserved budget (default 1000 chars), carved out of maxInjectChars alongside the recovery context reservation. Constraints are never truncated by the main budget — they are appended after budget truncation, same pattern as recovery context.

Key constraint: If there are no constraints and no traversal memories, this path should be a no-op with zero overhead beyond the focal entity resolution query.

4. Wire traversal into hybrid recall

Where: packages/daemon/src/memory-search.ts, inside hybridRecall

After the existing graph boost block (line ~270) and before the reranker:

Resolve focal entities from query tokens (use the same tokenizer as getGraphBoostIds)
Call traverseKnowledgeGraph() with focal entity IDs
For each traversal memory ID:
- If already in scored, apply a boost (same pattern as graph boost: (1 - tw) * score + tw where tw is configurable, default 0.2)
- If NOT in scored, add it with a base score derived from attribute importance
Re-sort after boost application
Constraints from traversal are NOT injected in recall (recall is a search, not context assembly — constraints only apply at session-start)

Guard: Only run if cfg.pipelineV2.graph.enabled and the KA tables exist. Use a try/catch with graceful fallback (same pattern as existing graph boost).

5. Update candidate pool fusion

Where: packages/daemon/src/hooks.ts and packages/daemon/src/session-memories.ts

The KA spec defines the new candidate pool as:

traversal pool ∪ effective top-50 ∪ embedding top-50

Currently it’s just effective top-N. After this deliverable:

Extend SessionMemoryCandidate.source type:

source: 'effective' | 'fts_only' | 'ka_traversal';

Record traversal candidates in session_memories with source = 'ka_traversal'
Cap the merged pool at a configurable limit (default 100) before budget selection

This ensures the predictive scorer (KA-4) can see which candidates came from traversal vs effective score vs FTS.

6. Traversal configuration

Add to PipelineV2Config (in packages/core/src/types.ts):

readonly traversal?: {
  readonly enabled: boolean;              // default true
  readonly maxAspectsPerEntity: number;    // default 10
  readonly maxAttributesPerAspect: number; // default 20
  readonly maxDependencyHops: number;      // default 30
  readonly minDependencyStrength: number;  // default 0.3
  readonly timeoutMs: number;             // default 500
  readonly boostWeight: number;           // default 0.2
  readonly constraintBudgetChars: number; // default 1000
};

Wire defaults in packages/daemon/src/memory-config.ts with YAML parsing, same pattern as structural config from KA-2.

Guard: traversal only runs when traversal.enabled && graph.enabled. If KA tables don’t exist yet (migration hasn’t run), traversal silently returns empty results.

7. Telemetry

Add traversal metrics to the session-start log entry (already logged at hooks.ts:966):

traversalEntities: number;     // focal entities resolved
traversalMemories: number;     // unique memory IDs from traversal
traversalConstraints: number;  // constraints surfaced
traversalTimedOut: boolean;    // whether traversal hit timeout

Also add to the /api/pipeline/status endpoint so the dashboard can show traversal health.

Key Files

packages/daemon/src/pipeline/graph-traversal.ts — new, core traversal logic
packages/daemon/src/pipeline/graph-search.ts — existing graph boost (reference, not modified)
packages/daemon/src/hooks.ts — wire traversal into session-start
packages/daemon/src/memory-search.ts — wire traversal into recall
packages/daemon/src/session-memories.ts — extend source type
packages/daemon/src/knowledge-graph.ts — KA-1 helpers (read, not modified)
packages/core/src/types.ts — traversal config types
packages/daemon/src/memory-config.ts — traversal config defaults

What NOT to Build (KA-4+)

Predictor structural features (KA-4)
Checkpoint structural snapshots (KA-5)
Dashboard visualization of graph traversal (KA-5)
Multi-hop traversal beyond one-hop dependencies (future)
API endpoints for browsing aspects/attributes (future)
Automatic task execution from task_meta (out of scope)

Verification

bun run build — no type errors
bun test — existing tests pass
bun run typecheck — clean
With graph populated (KA-2 has run on some memories):
- Session-start with a known project path resolves focal entities
- Traversal collects memory IDs from entity_attributes
- Constraints appear in the “Active Constraints” section of inject
Save a constraint fact (e.g., “never push directly to main for signetai”) — verify it appears in session-start inject when project path matches
Save multiple facts about a project entity — verify traversal pulls them into session-start candidates
Verify constraint budget is reserved separately (constraints survive main budget truncation)
Verify traversal is a no-op when no KA data exists (empty tables)
Verify traversal respects traversal.enabled = false
Verify traversal timeout works (doesn’t block session-start)
Verify recall graph boost includes traversal candidates
Verify session_memories records traversal candidates with source = 'ka_traversal'
Check telemetry: traversal metrics appear in session-start log