Sprint Brief: Knowledge Architecture KA-2
Structural Assignment Pipeline (Two-Pass)
What You’re Building
Facts extracted by the pipeline get structurally assigned to entities,
aspects, and constraints. This is the bridge between flat fact storage
and the structured knowledge graph described in
docs/KNOWLEDGE-ARCHITECTURE.md.
The architecture is two-pass:
- Pass 1 runs synchronously after extraction. No LLM. Links facts to their primary entity.
- Pass 2 runs in the background as pipeline jobs. Uses LLM to classify aspects/constraints (2a) and extract dependencies (2b).
Required Reading
docs/specs/INDEX.md— Cross-Cutting Invariantsdocs/specs/complete/knowledge-architecture-schema.md— section 6 (structural assignment architecture)docs/KNOWLEDGE-ARCHITECTURE.md— conceptual modeldocs/specs/SPRINT-BRIEF-KA1.md— KA-1 deliverables (schema and helpers this sprint depends on)
Prerequisites
KA-1 must be complete before this sprint begins:
- Migration 019 landed (entity_aspects, entity_attributes, entity_dependencies, task_meta tables exist)
- Core types exported (EntityAspect, EntityAttribute, etc.)
- Read/write helpers in
knowledge-graph.tsworking
Deliverables
1. Pass 1: Heuristic entity linking
Where: Hook into worker.ts after txPersistEntities completes.
After extraction persists entity triples and the decision phase writes new fact memories, run pass 1 for each written fact:
- Look up the primary entity from the extraction triple’s
sourcefield (already inentitiestable viatxPersistEntities) - Create a stub
entity_attributesrow:{ id: crypto.randomUUID(), aspectId: null, // awaiting classification agentId: 'default', memoryId: newMemoryId, // the fact memory just written kind: 'attribute', // default, may be reclassified content: factContent, normalizedContent: normalizedFactContent, confidence: fact.confidence, importance: 0.5, // default status: 'active', supersededBy: null, } - Enqueue
structural_classifyjob for this fact - If the extraction triple has a target entity that exists in the
graph, also enqueue
structural_dependencyjob
Schema note: entity_attributes.aspect_id is NOT NULL in the
current spec. For pass 1 stubs, either:
- (a) Create a catch-all “unclassified” aspect per entity, or
- (b) ALTER the FK to allow NULL (preferred — cleaner, no fake aspects)
Recommend option (b): update migration 019 to make aspect_id
nullable. Facts with aspect_id = NULL are valid and mean “awaiting
structural classification.”
Key constraint: Pass 1 must NOT call the LLM. It runs on the hot path inside the existing extraction worker. Keep it fast.
2. New job types in pipeline
Add two new job types alongside the existing 'extract' type:
type PipelineJobType = 'extract' | 'structural_classify' | 'structural_dependency';
Both use the same memory_jobs table with different job_type values.
Same lease/retry/dead-letter mechanics. Same exponential backoff.
Job payload for structural_classify:
{
"memory_id": "...",
"entity_id": "...",
"entity_name": "...",
"entity_type": "project",
"fact_content": "ooIDE uses bun as its package manager",
"attribute_id": "..."
}
Job payload for structural_dependency:
{
"memory_id": "...",
"entity_id": "...",
"entity_name": "...",
"fact_content": "ooIDE uses WorkOS AuthKit for authentication",
"target_entity_name": "WorkOS"
}
3. Pass 2a: Structural classification worker
New file: packages/daemon/src/pipeline/structural-classify.ts
This worker:
- Leases
structural_classifyjobs in batches (group by entity_id, max 8-10 per batch) - Loads the entity’s existing aspects from
entity_aspects - Builds the classification prompt (see Prompt Specifications below)
- Parses the LLM response
- For each classified fact:
- Upsert the aspect via
upsertAspect()fromknowledge-graph.ts - Update the
entity_attributesrow: setaspect_idandkind
- Upsert the aspect via
- Mark jobs completed
Batching strategy: Group pending structural_classify jobs by
entity_id. Process one entity’s batch at a time. This ensures the
prompt has accurate “existing aspects” context.
Error handling: If the LLM returns malformed JSON or drops facts from the batch, mark only the successfully parsed facts as completed. Failed facts stay pending for retry.
4. Pass 2b: Dependency extraction worker
New file: packages/daemon/src/pipeline/structural-dependency.ts
This worker:
- Leases
structural_dependencyjobs in batches (max 5 per batch) - Builds the dependency prompt (see Prompt Specifications below)
- Parses the LLM response
- For each identified dependency:
- Resolve target entity in the graph (by canonical_name)
- Call
upsertDependency()fromknowledge-graph.ts
- Mark jobs completed
Pre-filter: Only facts whose extraction triples have a target
entity that exists in the entities table should get
structural_dependency jobs. Skip self-referential facts.
5. Worker lifecycle integration
Wire both workers into the daemon alongside the existing extraction worker. They should:
- Start after the extraction worker is initialized
- Share the same
LlmProviderinstance - Respect
procedural.enabledandmutationsFrozenconfig flags - Poll on a configurable interval (separate from extraction polling)
- Stop cleanly in
cleanup()
Suggested config additions to PipelineV2Config:
readonly structural?: {
readonly enabled: boolean; // default true
readonly classifyBatchSize: number; // default 8
readonly dependencyBatchSize: number; // default 5
readonly pollIntervalMs: number; // default 10000
};
6. Aspect type suggestions
New file or constant map:
packages/daemon/src/pipeline/aspect-suggestions.ts
A mapping from entity type to suggested aspect names. Used in the classification prompt to guide the LLM:
export const ASPECT_SUGGESTIONS: Record<string, readonly string[]> = {
project: [
'architecture', 'dependencies', 'deployment', 'auth',
'data model', 'testing', 'team', 'configuration',
'development workflow', 'api', 'frontend', 'backend',
'infrastructure', 'security',
],
person: [
'preferences', 'communication style', 'expertise',
'projects', 'decision patterns', 'background',
'boundaries', 'work habits',
],
tool: [
'capabilities', 'configuration', 'integration',
'usage patterns', 'limitations',
],
system: [
'architecture', 'endpoints', 'configuration',
'dependencies', 'security', 'monitoring',
],
concept: [
'definition', 'relationships', 'applications',
'constraints',
],
skill: [
'capabilities', 'usage', 'configuration',
'triggers', 'limitations',
],
task: [
'requirements', 'dependencies', 'status',
'blockers', 'deliverables',
],
unknown: [
'general', 'relationships', 'properties',
],
};
Prompt Specifications
These prompts were tested against qwen3:4b via Ollama on 2026-03-04. Results documented below each prompt.
Classification Prompt (Pass 2a)
Classify each fact into an aspect and kind for the given entity.
Entity: {entityName} ({entityType})
Existing aspects: {existingAspects | "[none yet]"}
Suggested: {ASPECT_SUGGESTIONS[entityType].join(", ")}
Facts:
1. {fact1}
2. {fact2}
...
N. {factN}
JSON array, each: {"i": number, "aspect": string, "kind": "attribute"|"constraint", "new": boolean}
/no_think
Template variables:
{entityName}— entity.name from DB{entityType}— entity.entity_type from DB{existingAspects}— comma-separated list of existing aspect names for this entity, or"[none yet]"if empty{facts}— numbered list of fact content strings- Max 8-10 facts per batch
LLM settings:
- Model: same as extraction worker (default
qwen3:4b) - Temperature: 0.1
/no_thinkappended to suppress chain-of-thought
Expected output:
[
{"i": 1, "aspect": "auth system", "kind": "attribute", "new": false},
{"i": 2, "aspect": "boundaries", "kind": "constraint", "new": true}
]
Field definitions:
i— 1-indexed fact number matching the input listaspect— existing aspect name OR new aspect name to createkind—"attribute"for regular facts,"constraint"for rules that must always be followednew—trueif this creates a new aspect,falseif using an existing one
Tested results (qwen3:4b):
- 10 facts, project entity: 9/10 correct classifications, 1 debatable (monorepo structure → “dependencies” vs “architecture”)
- 12 facts, person entity: 12/12 correct with tighter prompt format
- 20 facts: degraded — dropped 4 facts, lost format discipline. Hard limit is ~10 facts per batch.
- Constraint detection is strong: “never push to main”, “must include agent_id”, “always run typecheck” all correctly identified
- New aspect creation works: model suggests “frontend”, “backend”, “boundaries” when not in existing list
- Existing aspect reuse works: model correctly uses existing aspects
and sets
new: false
Known failure modes:
-
12 facts: starts dropping facts from output
-
15 facts: may hallucinate fact content or change field names
- Verbose prompt preamble: causes format confusion. Keep it minimal.
- Long field names in JSON schema: use short names (i, aspect, kind, new)
Dependency Prompt (Pass 2b)
Classify each fact. Also identify if the fact implies a dependency between entities.
Entity: {entityName} ({entityType})
Aspects: {existingAspects}
Dependency types: uses, requires, owned_by, blocks, informs
1. {fact1}
2. {fact2}
...
N. {factN}
For each fact return: {"i": N, "aspect": "...", "kind": "attribute"|"constraint", "dep_target": "entity or null", "dep_type": "type or null"}
/no_think
Template variables:
- Same as classification prompt
- Max 5 facts per batch (stricter limit due to more complex output)
LLM settings:
- Model: same as extraction worker (default
qwen3:4b) - Temperature: 0.1
/no_thinkappended
Expected output:
[
{
"i": 1,
"aspect": "auth system",
"kind": "attribute",
"dep_target": "WorkOS AuthKit",
"dep_type": "uses"
},
{
"i": 2,
"aspect": "development workflow",
"kind": "attribute",
"dep_target": null,
"dep_type": null
}
]
Field definitions:
i— 1-indexed fact numberaspect— aspect classification (bonus: also classifies here)kind— attribute or constraintdep_target— name of target entity if dependency exists, else nulldep_type— one of:uses,requires,owned_by,blocks,informs, or null if no dependency
Dependency type semantics:
uses— entity actively uses the target (ooIDE uses WorkOS)requires— entity cannot function without the target (backend requires PostgreSQL)owned_by— entity is owned/maintained by target (ooIDE owned_by Nicholai)blocks— entity blocks progress on target (auth flow blocks deployment)informs— entity’s design was influenced by target (testing informed by Signet pipeline)
Tested results (qwen3:4b):
- 5 facts: 5/5 correct format, 4/5 correct dependencies
- Correctly identified: WorkOS → uses, React 19 → requires, PostgreSQL → requires
- Correctly returned null for facts with no dependency
- One miss: Nicholai as primary developer → no dependency detected (expected owned_by, got null). Acceptable — person→project ownership is subtle.
Known failure modes:
-
5 facts: starts dropping facts or hallucinating content
- Combined with verbose prompt: loses format, reverts to markdown
- 8+ facts with dependencies: returned only 16/20 in stress test, changed field names
Design note: The dependency prompt also returns aspect and kind classifications as a bonus. When both pass 2a and 2b run on the same fact, the dependency prompt’s classification can serve as a confirmation signal. If they disagree, the classification prompt (2a) takes precedence since it was designed and tested specifically for that task.
JSON Parsing
Use the same stripFences and tryParseJson helpers from
packages/daemon/src/pipeline/extraction.ts. The model occasionally
wraps output in markdown fences or includes trailing commas.
Additional validation for structural prompts:
- Verify
ifield maps to a valid fact index in the batch - Verify
aspectis a non-empty string - Verify
kindis exactly"attribute"or"constraint" - Verify
dep_typeis one of the valid dependency types or null - Skip malformed entries, don’t fail the whole batch
Key Files
packages/daemon/src/pipeline/worker.ts— hook pass 1 after writespackages/daemon/src/pipeline/structural-classify.ts— new, pass 2apackages/daemon/src/pipeline/structural-dependency.ts— new, pass 2bpackages/daemon/src/pipeline/aspect-suggestions.ts— new, type mappackages/daemon/src/knowledge-graph.ts— CRUD helpers (from KA-1)packages/daemon/src/pipeline/provider.ts— LlmProviderpackages/daemon/src/pipeline/extraction.ts— JSON parsing helperspackages/daemon/src/memory-config.ts— structural config defaultspackages/daemon/src/daemon.ts— wire worker lifecyclepackages/core/src/types.ts— structural config types
What NOT to Build (KA-3+)
- Traversal query builder (KA-3)
- Session-start context injection from graph (KA-3)
- Constraint surfacing in retrieval (KA-3)
- Predictor structural features (KA-4)
- Checkpoint structural snapshots (KA-5)
- Dashboard visualization (KA-5)
- API endpoints for aspects/attributes (KA-3)
- Backfill worker for legacy memories (separate sprint after KA-2)
Verification
bun run build— no type errorsbun test— existing tests passbun run typecheck— clean- Save a memory via daemon, verify:
- Extraction runs (existing behavior)
- Pass 1 creates stub
entity_attributesrow withaspect_id = NULL structural_classifyjob enqueued inmemory_jobsstructural_dependencyjob enqueued (if fact has target entity)
- Wait for pass 2a worker to run, verify:
entity_attributes.aspect_idpopulatedentity_attributes.kindset to attribute or constraintentity_aspectsrow created if new aspect
- Wait for pass 2b worker to run, verify:
entity_dependenciesrow created for facts with dependencies- Target entity resolved by canonical_name
- Save a fact like “never push directly to main” — verify it gets
kind = 'constraint' - Save 3 facts about the same entity — verify they batch into one LLM call for classification
- Verify malformed LLM response doesn’t crash worker (graceful skip)
- Verify
structural.enabled = falsedisables both workers