MEMORY.md Rolling Window Lineage
Spec metadata:
- ID:
memory-md-rolling-window-lineage - Status:
complete - Hard depends on:
memory-md-temporal-head,lossless-working-memory-runtime,session-continuity-protocol - Registry:
docs/specs/INDEX.md
Problem
MEMORY.md currently compresses too aggressively for high-frequency usage.
Under 50+ sessions/day, in-window sessions can be omitted or reduced to weak
truncations, and drill-down can depend on tool surfaces instead of stable file
lineage.
The deeper failure mode is source-of-truth ambiguity between markdown and DB history representations.
Core decisions (normative)
- Canonical historical content consists only of summary, transcript, and compaction markdown artifacts.
MEMORY.mdis a derived, rebuildable projection and is not canonical history.- Window membership and day grouping use UTC and are computed from
ended_at, orcaptured_atwhenended_atis null. - Metadata that can appear after session end (for example later compaction linkage) must not require mutation of immutable content artifacts.
- LLM usage in this lane is scoped to generating one sentence stored in
artifact frontmatter;
MEMORY.mdrendering itself is programmatic.
Workspace root and link model
Workspace root
Workspace root is SIGNET_WORKSPACE (default ~/.agents).
Canonical artifacts live under:
<workspace_root>/memory/
Link format
All cross-document links MUST use Obsidian wikilinks with paths relative to workspace root.
Valid examples:
[[memory/2026-03-28T22-34-06.792Z--h7k3n4p9m2x1q8r5--summary.md]][[memory/2026-03-28T22-34-06.792Z--h7k3n4p9m2x1q8r5--transcript.md|transcript]]
Authority split
Markdown authority (canonical)
Markdown is authoritative for historical content:
- summary narratives
- sanitized transcript text
- compaction narratives
- canonical cross-document lineage links
Database authority (DB-native)
Database is authoritative for runtime state that markdown cannot model faithfully:
- temporal telemetry (timing, decay, ranking, access counters)
- runtime execution metadata and queues
- graph state (entities, aspects, attributes, dependencies, relations)
Conflict policy
- For historical content fields: markdown wins, DB is repaired by re-index.
- For runtime telemetry fields: DB wins.
Canonical artifact model
Immutable content artifacts
Kinds:
summary(--summary.md)transcript(--transcript.md) sanitized form onlycompaction(--compaction.md)
These files are immutable after first successful commit.
Mutable session manifest
Each session gets one mutable manifest file:
--manifest.md
The manifest is the only file that may gain new links after session end (for example compaction arriving later).
File naming
{captured_at_iso_fs}--{session_token}--{kind}.md
captured_at_iso_fs: UTC timestamp with filesystem-safe separatorssession_token: deterministic token (see token contract below)kind:summary,transcript,compaction,manifest
Frontmatter contract
Immutable content artifact frontmatter
Required:
kindagent_idsession_idsession_keyprojectharnesscaptured_atstarted_at(nullable)ended_at(nullable)manifest_pathsource_node_id(nullable)content_sha256hash_scope(must equalbody-normalized-v1)sanitizer_version(required for transcript kind)memory_sentence(one-sentence session summary used byMEMORY.md)memory_sentence_version(prompt/schema version for sentence generation)memory_sentence_quality(okorfallback)memory_sentence_generated_at
Not required on immutable files:
compaction_path
Mutable manifest frontmatter
Required:
kind(valuemanifest)agent_idsession_idsession_keyprojectharnesscaptured_atsummary_pathtranscript_pathcompaction_path(nullable, may be set later)memory_md_refs(list ofMEMORY.mdentries that include this session)updated_at
Checksum scope
content_sha256 hashes normalized markdown body only, excluding frontmatter.
Normalization contract:
- LF line endings
- trailing whitespace removed per line
- no trailing blank lines at EOF
- UTF-8 bytes over normalized body
Session token contract
Token derivation (deterministic, collision-resistant):
- canonical session identity =
session_keywhen present, elsesession_id - seed =
${agent_id}:${canonical_session_identity} - token = first 16 chars of lowercase base32(sha256(seed))
Display short IDs (for UI readability) are non-authoritative aliases and must never be used as primary storage keys.
MEMORY.md projection contract
Status
MEMORY.md is a derived view over canonical artifacts plus DB-native runtime
signals. It is always rebuildable.
Required section
## Session Ledger (Last 30 Days) is mandatory.
Window semantics (strict)
At render time t_now in UTC:
- include sessions where
membership_tsis in[t_now - 30 days, t_now] membership_ts = ended_atwhen present, elsecaptured_at- day buckets use
membership_tsUTC date
Projection excludes temp/test sessions from visible lineage surfaces. For
projectable sessions, the renderer must prefer a fixed output budget over
unbounded growth: high-signal head sections stay intact, then the oldest ledger
rows may be clipped with an explicit notice in MEMORY.md.
Per-session sentence quality floor
Each in-window row uses memory_sentence from artifact frontmatter.
memory_sentence must satisfy all checks:
- 12-48 words
- terminal punctuation (
.!?) - contains at least one concrete anchor from session context:
- project basename, or
- file/package path token, or
- issue/PR/task identifier, or
- named component/system token
- not equal to known low-signal templates
(
"Investigated issue.","Worked on task.","Reviewed code.")
If LLM output fails checks, runtime must store a deterministic fallback sentence
and set memory_sentence_quality: fallback.
Canonical row shape
- 2026-03-28T22:34:06.792Z | session=a245b4fc-b607-4c50-8566-ebe23264272f | project=/home/nicholai/signet/signetai | Finalized DP-19 write-gate clamping decisions and queued scope-aware dedup parity validation before merge. [[memory/2026-03-28T22-34-06.792Z--h7k3n4p9m2x1q8r5--summary.md|summary]] [[memory/2026-03-28T22-34-06.792Z--h7k3n4p9m2x1q8r5--transcript.md|transcript]] [[memory/2026-03-28T22-34-06.792Z--h7k3n4p9m2x1q8r5--manifest.md|manifest]]
Write and crash state model
Write ordering (source-of-truth first)
Session-end:
- sanitize transcript (deterministic sanitizer)
- generate one-sentence
memory_sentencevia LLM - validate
memory_sentenceagainst quality floor, else deterministic fallback - write immutable transcript artifact with sentence metadata in frontmatter
- write immutable summary artifact with sentence metadata in frontmatter
- write/create manifest with summary+transcript paths
- update
MEMORY.mdprojection from frontmatter + deterministic metadata - upsert derived content-linked DB rows
- continue DB-native telemetry updates
Compaction-complete:
- generate compaction
memory_sentencevia LLM - validate sentence, else deterministic fallback
- write immutable compaction artifact with sentence metadata in frontmatter
- update manifest
compaction_path+updated_at - update
MEMORY.mdprojection from frontmatter + deterministic metadata - upsert derived content-linked DB rows
- reset live transcript buffers only after canonical writes commit
Partial-failure states
Implementation must model and recover these states explicitly:
- transcript written, summary failed
- sentence generated, artifact write failed
- sentence generation failed, fallback path engaged
- summary/transcript written, manifest failed
- artifacts+manifest written, MEMORY.md update failed
- canonical writes done, DB index update failed
Recovery must resume idempotently using manifest + checksums.
Sanitization contract
Transcript sanitization must be deterministic and versioned.
Required:
- function id:
sanitize_transcript_v1 - stable redaction policy and normalization order
- explicit upgrades via new versions (
v2,v3) with migration notes
Sentence generation must also be deterministic at contract level:
- function id:
memory_sentence_v1 - strict output target: exactly one sentence
- quality gate + deterministic fallback path
Derived DB contract
DB rows for content lineage/search are derived indexes over canonical files.
Required:
source_path(workspace-root-relative)source_sha256source_kindagent_id- session identity fields
Re-index rebuilds content-linked rows from canonical artifacts for target scope. Runtime telemetry rows are preserved unless explicitly reset by operator action.
Re-index, deletion, and privacy removal
Re-index
- scan canonical artifacts in scope
- validate frontmatter + link graph + checksums
- rebuild content-linked DB rows
- regenerate
MEMORY.md
Deletion/removal
Privacy-driven removal must write tombstones and remove linked DB rows for content lineage.
Tombstone fields:
agent_idsession_tokenremoved_atreasonremoved_paths
Re-index must honor tombstones so deleted content is not resurrected.
Concurrency contract
High-volume workloads require explicit write coordination.
Required:
- lease-based writer lock for
MEMORY.mdprojection updates - per-session manifest compare-and-swap revisioning
- atomic file replace semantics for canonical writes
- retry-safe idempotency keys for session-end and compaction flows
Implementation plan
Phase 0: contracts and helpers
- add canonical naming + token helper
- add body hash helper (
body-normalized-v1) - add wikilink helper (workspace-root-relative)
- add deterministic sanitizer version plumbing
- add
memory_sentencefrontmatter schema + quality gate helpers
Phase 1: canonical artifact + manifest writers
- write immutable summary/transcript artifacts
- add mutable manifest lifecycle
- write immutable compaction artifacts + manifest backfill
- add crash-recovery state handling
- wire LLM one-sentence generation into summary/compaction artifact frontmatter
Phase 2: MEMORY.md renderer
- render strict rolling 30-day UTC ledger
- read per-session sentence from frontmatter
- enforce per-session sentence quality floor + fallback flag handling
- enforce fixed-budget clipping of oldest ledger rows with explicit notice
Phase 3: derived DB indexing and re-index
- upsert content-linked DB rows with path+hash pointers
- implement idempotent re-index from markdown
- add tombstone-aware deletion handling
Phase 4: docs and safeguards
- document authority split in API/HOOKS/HARNESSES docs
- add operator docs for re-index + privacy removal
- add regression suite for contracts above
Validation and regression tests
- 1,500-session window test (50/day x 30) preserves the newest in-window ledger rows, clips oldest rows when needed, and emits an explicit clipping notice.
- sentence floor test rejects low-signal rows.
- wikilink format test enforces workspace-root-relative links.
- immutable artifact test rejects post-commit mutation.
- manifest mutability test allows late
compaction_pathupdates only in manifest. - checksum scope test verifies
body-normalized-v1behavior. - sanitizer determinism test ensures stable output for same input.
- partial-failure recovery tests cover each state model branch.
- re-index parity test rebuilds content-linked DB rows from markdown.
- runtime telemetry preservation test proves re-index does not clobber DB-native temporal counters/ranks.
- tombstone test proves deletion/removal survives re-index.
- multi-agent scoping test proves no cross-agent bleed.
- frontmatter sentence projection test proves
MEMORY.mdrows are sourced from artifactmemory_sentencefields (not full-file LLM rewrites).
Risks and mitigations
-
Risk: large
MEMORY.mdat extreme session volume. Mitigation: enforce sentence length band, keep deep detail in linked artifacts. -
Risk: disk growth from transcript artifacts. Mitigation: sanitized content only, plus out-of-window archival policy.
-
Risk: operational complexity from manifest + recovery states. Mitigation: explicit state machine tests and idempotent replay tooling.
-
Risk: future regressions reintroduce dual-canonical ambiguity. Mitigation: CI guardrails on authority split and path/hash contracts.
Open design questions
- Should sentence length band be global or recency-tiered?
- Should out-of-window archival produce monthly markdown bundles?
- Should CLI ship
signet memory reindexandsignet memory open <session>in this wave or follow-up?