Knowledge Architecture Schema and Traversal Spec
Status: Approved (v1)
Audience: Core + Daemon maintainers
Spec metadata:
- ID:
knowledge-architecture-schema - Status:
approved - Hard depends on:
memory-pipeline-v2,session-continuity-protocol,procedural-memory-plan - Blocks:
predictive-memory-scorer - Registry:
docs/specs/INDEX.md
Related docs:
docs/KNOWLEDGE-ARCHITECTURE.md(conceptual model)docs/specs/planning/predictive-memory-scorer.md(learned ranking)docs/specs/complete/memory-pipeline-plan.md(pipeline contracts)docs/specs/approved/procedural-memory-plan.md(skills as procedural memory)docs/specs/approved/session-continuity-protocol.md(checkpoint and recovery)
1) Purpose
KNOWLEDGE-ARCHITECTURE.md defines the conceptual model (entity -> aspect
-> attribute/constraint, plus dependency traversal). This spec turns that
model into an implementation contract with:
- additive schema changes
- extraction and backfill contracts
- traversal-first retrieval contracts
- integration points with predictive scoring and continuity checkpoints
This is the structural floor that predictive ranking should run on.
Local dependency graph:
flowchart LR
MP[memory-pipeline-v2] --> KA[knowledge-architecture-schema]
SCP[session-continuity-protocol] --> KA
PM[procedural-memory-plan] --> KA
KA --> PMS[predictive-memory-scorer]
2) Scope and Non-Goals
In scope
- Entity/aspect/attribute/constraint/task representation in SQLite
- Dependency edges as explicit graph structure
- Session-start traversal contracts for context injection
- Cross-spec contracts with scorer, procedural memory, and continuity
Out of scope (this revision)
- Multi-hop planning/reasoning beyond one-hop dependency traversal
- Automatic task execution
- Autonomous destructive mutations without existing policy gates
3) Baseline (Current State)
Current graph-relevant state already in repo:
entities,relations, andmemory_entity_mentionsexistsession_memoriesexists and stores candidate/injection telemetrysession_checkpointsexists and stores continuity digests- Predictor crate and training pipeline exist through Phase 2
Current gap:
The system has entity mentions and relation edges, but no first-class representation for aspects, constraints, or task lifecycle. Retrieval is still primarily search/scoring-first, not traversal-first.
4) Cross-Spec Contract Map
| Spec | Produces | Consumes from this spec |
|---|---|---|
memory-pipeline-plan.md | extraction + mutation pipeline | structural assignment contract, schema ownership, backfill behavior |
predictive-memory-scorer.md | ranking model + training loop | traversal candidate pool, structural features (entity/aspect/constraint) |
procedural-memory-plan.md | skill nodes + procedural decay | shared entity/aspect model (entity_type='skill') |
session-continuity-protocol.md | checkpoint + recovery | focal entity/aspect snapshot for recovery injection and training context |
Normative rule: predictive ranking is an enhancer. Traversal-defined structure is the primary retrieval floor.
5) Data Model (Additive)
5.1 Entity type taxonomy
Extend entities.entity_type usage to the canonical set:
personprojectsystemtoolconceptskilltaskunknown(fallback)
5.2 Backfill: agent_id on entities
The entities table (migration 002) predates the multi-agent scoping
invariant. Add agent_id with a default and index:
ALTER TABLE entities ADD COLUMN agent_id TEXT NOT NULL DEFAULT 'default';
CREATE INDEX idx_entities_agent ON entities(agent_id);
All new KA tables include agent_id for database-level tenant isolation.
This is not a KA concern — it is the multi-agent invariant applied
uniformly. Queries filter by agent_id unless explicitly requesting
cross-agent results.
5.3 New table: entity_aspects
CREATE TABLE entity_aspects (
id TEXT PRIMARY KEY,
entity_id TEXT NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
agent_id TEXT NOT NULL DEFAULT 'default',
name TEXT NOT NULL,
canonical_name TEXT NOT NULL,
weight REAL NOT NULL DEFAULT 0.5,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL,
UNIQUE(entity_id, canonical_name)
);
CREATE INDEX idx_entity_aspects_entity ON entity_aspects(entity_id);
CREATE INDEX idx_entity_aspects_agent ON entity_aspects(agent_id);
CREATE INDEX idx_entity_aspects_weight ON entity_aspects(weight DESC);
weight is structural centrality + learned utility. It is not pure frequency.
5.4 New table: entity_attributes
CREATE TABLE entity_attributes (
id TEXT PRIMARY KEY,
aspect_id TEXT NOT NULL REFERENCES entity_aspects(id) ON DELETE CASCADE,
agent_id TEXT NOT NULL DEFAULT 'default',
memory_id TEXT REFERENCES memories(id) ON DELETE SET NULL,
kind TEXT NOT NULL, -- 'attribute' | 'constraint'
content TEXT NOT NULL,
normalized_content TEXT NOT NULL,
confidence REAL NOT NULL DEFAULT 0.0,
importance REAL NOT NULL DEFAULT 0.5,
status TEXT NOT NULL DEFAULT 'active', -- 'active' | 'superseded' | 'deleted'
superseded_by TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
CREATE INDEX idx_entity_attributes_aspect ON entity_attributes(aspect_id);
CREATE INDEX idx_entity_attributes_agent ON entity_attributes(agent_id);
CREATE INDEX idx_entity_attributes_kind ON entity_attributes(kind);
CREATE INDEX idx_entity_attributes_status ON entity_attributes(status);
Constraints are first-class rows (kind='constraint'), not inferred tags.
5.5 New table: entity_dependencies
CREATE TABLE entity_dependencies (
id TEXT PRIMARY KEY,
source_entity_id TEXT NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
target_entity_id TEXT NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
agent_id TEXT NOT NULL DEFAULT 'default',
aspect_id TEXT REFERENCES entity_aspects(id) ON DELETE SET NULL,
dependency_type TEXT NOT NULL, -- 'uses' | 'requires' | 'owned_by' | 'blocks' | 'informs'
strength REAL NOT NULL DEFAULT 0.5,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
CREATE INDEX idx_entity_dependencies_source ON entity_dependencies(source_entity_id);
CREATE INDEX idx_entity_dependencies_target ON entity_dependencies(target_entity_id);
CREATE INDEX idx_entity_dependencies_agent ON entity_dependencies(agent_id);
These are explicit traversal edges. They are not similarity artifacts.
5.6 New table: task_meta
CREATE TABLE task_meta (
entity_id TEXT PRIMARY KEY REFERENCES entities(id) ON DELETE CASCADE,
agent_id TEXT NOT NULL DEFAULT 'default',
status TEXT NOT NULL, -- 'open' | 'in_progress' | 'blocked' | 'done' | 'cancelled'
expires_at TEXT,
retention_until TEXT,
completed_at TEXT,
updated_at TEXT NOT NULL
);
CREATE INDEX idx_task_meta_agent ON task_meta(agent_id);
CREATE INDEX idx_task_meta_status ON task_meta(status);
CREATE INDEX idx_task_meta_retention ON task_meta(retention_until);
Tasks share entity structure but use separate lifecycle rules.
6) Structural Assignment (Two-Pass Architecture)
Structural assignment uses a two-pass architecture to balance speed and accuracy. Pass 1 runs synchronously on the hot path with no LLM call. Pass 2 runs in the background as separate pipeline jobs.
6.1 Pass 1: Heuristic entity linking (synchronous, no LLM)
After fact extraction and entity persistence, the pipeline links each written fact memory to its primary entity:
- Resolve primary entity from the extraction triple’s
sourcefield (already persisted bytxPersistEntities) - Create a stub
entity_attributesrow withaspect_id = NULLandkind = 'attribute'(default) - Enqueue two background jobs:
structural_classifyandstructural_dependency
Pass 1 does NOT attempt aspect classification or constraint detection. It only establishes the fact → entity link. This is cheap and reliable — the extraction already identifies entities.
6.2 Pass 2a: Structural classification (background, LLM)
A dedicated LLM prompt classifies each unassigned fact into an aspect and determines whether it is an attribute or constraint.
Input per batch (max 8-10 facts):
- The parent entity (name, type, existing aspects)
- Suggested aspect patterns for the entity type
- The fact content
Output per fact:
aspect— existing or new aspect namekind—'attribute'or'constraint'new— whether this creates a new aspect
Job type: structural_classify. Same lease/retry/dead-letter mechanics
as extraction jobs. Batched by entity to provide aspect context.
6.3 Pass 2b: Dependency extraction (background, LLM)
A separate LLM prompt identifies structural dependencies between entities implied by fact content.
Input per batch (max 5 facts):
- The source entity
- The fact content
- Known entities in the graph (for target resolution)
Output per fact:
dep_target— target entity name (or null)dep_type—'uses'|'requires'|'owned_by'|'blocks'|'informs'(or null)
Pre-filter: only facts whose extraction triples reference other entities are sent to this pass. Pure self-referential facts skip it entirely.
Job type: structural_dependency. Independent queue from classification.
6.4 Assignment invariants
- Every active atomic fact memory should map to exactly one primary
entity_attributesrow. - Constraints always map to
kind='constraint'. - Dependency edges are additive and idempotent.
supersededattributes remain auditable; they do not vanish.- Pass 2a and 2b are isolated — errors in one do not affect the other.
- Facts with
aspect_id = NULLare valid (awaiting classification).
6.5 Backfill behavior
Maintenance worker backfills unassigned legacy memories incrementally:
- scan memories with no
entity_attributesrow in batches - run pass 1 (entity linking) then enqueue pass 2 jobs
- skip low-confidence rows and record telemetry
- never block foreground hooks
6.6 Model constraints (tested against qwen3:4b)
- Classification prompt handles 8-10 facts per batch reliably
- Dependency prompt handles 5 facts per batch reliably
- Beyond these limits, the model drops facts or loses format discipline
- Prompt must use short JSON field names and minimal boilerplate
/no_thinkflag suppresses chain-of-thought for structured outputtemperature: 0.1for deterministic classification- Prompt specifications are documented in the KA-2 sprint brief
7) Retrieval Contract (Traversal First)
7.1 Session-start context assembly
Order of operations:
- Resolve focal entities from session signals (project path, checkpoint, session key lineage, prompt hints)
- Pull all active constraints for focal entities and one-hop dependencies
- Pull top aspects by
weightfor each focal entity - Pull active attributes under those aspects
- Materialize candidate memory IDs via
entity_attributes.memory_id
This produces a structurally coherent candidate pool before heuristic or model ranking runs.
7.2 Candidate pool fusion with predictor pre-filter
Predictor pre-filter contract changes from:
effective top-50 U embedding top-50
to:
traversal pool U effective top-50 U embedding top-50
Then dedupe and cap (configurable, default 100).
7.3 Hard retrieval invariant
Constraints are always surfaced when their entity is in scope, independent of score rank.
8) Predictive Scorer Integration
predictive-memory-scorer.md consumes this spec in three places:
- Candidate quality: scorer receives structurally coherent candidates
- Feature enrichment: add structural features per candidate
- entity slot hash
- aspect slot hash
is_constraint
- Evaluation slices: report win/loss by focal entity/project, not only global EMA
The predictor still earns influence via comparisons. This spec improves its input quality and interpretability.
9) Procedural Memory Integration
procedural-memory-plan.md remains authoritative for skill lifecycle.
Alignment rules:
- Skills remain
entity_type='skill' - Skill metadata (
skill_meta) remains source-of-truth for runtime skill behavior - Skill knowledge can also map into
entity_aspects/entity_attributesfor unified traversal and scoring
This keeps one graph with type-specific lifecycle rules.
10) Continuity Protocol Integration
session-continuity-protocol.md integration points:
- checkpoint digests add optional structural snapshot fields:
- focal entities
- active aspects
- surfaced constraints
- recovery injection should prioritize these structural snapshots over raw narrative when budget is tight
- predictor label quality improves when session-end evaluation knows which constraints and aspects were in play
11) Migration and Phase Plan
KA-1 Schema and types
- Add migration
019-knowledge-structure.ts:- Backfill
agent_idonentitiestable - Create
entity_aspects,entity_attributes,entity_dependencies,task_meta— all withagent_idcolumn
- Backfill
- Add core types and read/write helpers
KA-2 Structural assignment in pipeline
- Add assignment stage in summary/extraction path
- Persist mappings for newly extracted atomic facts
- Add telemetry for assignment confidence and coverage
KA-3 Traversal retrieval path
- Add traversal query builder in daemon
- Wire session-start and recall flows to include traversal candidates
- Enforce constraint surfacing invariant
KA-4 Predictor coupling
- Extend predictor request payload with structural features
- Update comparison/audit APIs with structural slices
KA-5 Continuity + dashboard
- Store structural checkpoint slices
- Surface entity/aspect/constraint context in dashboard timeline and predictor inspector
12) Acceptance Criteria
-
=90% of active atomic fact memories have structural assignment (entity + aspect + attribute/constraint)
- Session-start context includes constraint rows for in-scope entities with zero omissions in test fixtures
- Traversal candidate pool remains bounded and deterministic
- Predictor comparison reports include structural slices (entity/project)
- Recovery injections include structural snapshot fields when available
13) Open Questions
- Should aspects be free-form with canonicalization, or backed by a small taxonomy per entity type?
- Should task retention default to fixed duration or confidence-driven decay?
- Do we need a dedicated
constraintstable later for policy-level joins, or isentity_attributes(kind='constraint')sufficient?
14) Immediate Next Steps
- Approve this spec as the implementation contract for structural retrieval.
- Update predictive scorer Phase 3 tasks to include traversal pool fusion.
- Draft migration
019-knowledge-structure.tswith exact indexes and idempotency behavior. - Add a small offline benchmark set comparing traversal-first candidate generation vs current heuristic pre-filter.