Signet Memory Pipeline v2: Implementation Specification
Status: Draft for implementation
Audience: Core + Daemon maintainers
Scope note: This document is a delivery spec. It intentionally defines behavior, contracts, and validation criteria. It does not include implementation code.
1) Purpose
Signet needs a memory pipeline that is not only searchable, but selective, self-correcting, auditable, and safe under concurrency.
This spec hardens the prior plan into an implementation-ready design with:
- explicit data contracts and lifecycle states
- durable async processing
- lock-safe transaction boundaries
- provider and privacy controls
- graph-aware retrieval
- measurable quality, latency, reliability, and cost targets
2) Product Objectives
Primary goals
- Increase recall relevance and consistency over current append-only memory behavior.
- Prevent duplicate and contradictory memory growth.
- Keep user data safe under provider outages and concurrent writes.
- Preserve reversibility through complete memory history and recovery.
- Keep local-first defaults, with optional remote provider support.
- Enable bounded autonomous maintenance so agents can diagnose and repair common failure states without human intervention.
Non-goals (for this release)
- Full multi-hop graph reasoning beyond one-hop expansion.
- Autonomous memory pruning by LLM without recoverable audit trail.
- External graph databases (Neo4j, etc.).
- Mandatory online provider dependency.
3) Success Criteria (Release Gates)
The release is complete only if all are true:
- No data loss when extraction/LLM fails (raw memory persists).
- Pinned memories cannot be deleted by model decisions.
- Duplicate insert race for identical content is blocked at DB level.
/rememberremains responsive under provider outage via async fallback and retry queue.- Search quality improves against baseline on offline eval set.
- History endpoint shows complete ADD/UPDATE/DELETE lineage.
- Soft-deleted memories are recoverable during retention window.
- Agents can explicitly modify a memory by ID with audit-safe semantics.
- Agents can explicitly forget by ID or query without hard delete.
- Autonomous maintenance loops resolve common degradation states within SLO and always leave an auditable trail.
4) Current Constraints and Assumptions
- Memory pipeline is migrating from Python subprocesses into daemon TypeScript.
- Existing schemas in the wild are mixed (legacy daemon schema, unified core schema, and migration edge cases).
- Daemon currently opens multiple SQLite connections per request path; this is acceptable short term, but write contention increases under LLM-stage latency.
- sqlite-vec remains the vector backend for this release.
- Default model path is local Ollama unless user explicitly chooses remote provider.
4.1 Mem0 comparison findings (implemented behavior)
Review source:
references/mem0/mem0/memory/main.pyreferences/mem0/mem0-ts/src/oss/src/memory/index.tsreferences/mem0/server/main.py
Observed parity requirements:
- Mem0 supports both inferred and explicit mutation paths:
- inferred: ADD/UPDATE/DELETE/NONE in add pipeline
- explicit: update(id), delete(id), delete_all(filters), history(id)
- Mem0 treats update/delete as first-class operations with history entries for each mutation.
- Mem0 exposes API surface for modify/forget (
PUT /memories/{id}andDELETE /memories/{id}), not only infer-on-add behavior.
Signet implication: inferred mutation is not enough; agent-directed modify/forget must be a formal API + policy surface.
4.2 OpenClaw integration baseline and gap
Current Signet state in-repo:
@signet/connector-openclawis install-time wiring that patches OpenClaw config and installs command-hook files under~/.agents/hooks/agent-memory/.- Installed hook handler supports command actions
/remember,/recall, and/contextvia daemon hook endpoints. @signetai/adapter-openclawexists as a runtime plugin entry point and currently exposes lifecycle calls for session start/pre-compaction/ compaction complete plus manual remember/recall helpers.- Daemon hook surface already includes richer lifecycle endpoints, including user-prompt-submit and session-end.
Gap vs desired state:
- Runtime plugin path is not yet the single canonical path for OpenClaw memory operations.
- Legacy command-hook path can overlap with plugin behavior and must be guarded against duplicate recall/capture.
- Explicit modify/forget tools are not yet available in the runtime OpenClaw integration path.
5) High-Level Architecture
5.1 Pipeline modes
remember supports three execution modes:
sync: do extraction and update-decision inline within latency budget.async: persist raw memory immediately and enqueue processing job.auto(default): attempt sync until timeout budget, then degrade to async without failing write.
5.2 Lifecycle stages
- Ingest and normalize input.
- Idempotency and exact dedup check.
- Persist raw memory envelope.
- Run extraction (facts + entities).
- For each fact: retrieve candidates, decide ADD/UPDATE/DELETE/NONE, apply decision.
- Persist embeddings and graph links.
- Persist history and metrics.
5.3 Key principle
No long-running external call is allowed inside a write-locked transaction.
6) Data Model Specification
6.1 memories table
Required fields:
- identity:
id - content:
content,normalized_content,content_hash(SHA-256) - classification:
type,category - source:
source_type,source_id,who - control:
pinned,manual_override,is_deleted,deleted_at - quality/process:
confidence,update_count,extraction_status - retrieval:
importance,access_count,last_accessed - model provenance:
embedding_model,extraction_model - audit stamps:
created_at,updated_at,updated_by,version
Required indexes:
- unique partial index on
content_hashwhere not null - index on
is_deleted,pinned,type,created_at - index on
source_type, source_id
6.2 memory_history table
Purpose: immutable audit trail for all semantic decisions.
Fields:
- identity:
id,memory_id - event:
eventin {ADD, UPDATE, DELETE, RECOVER, MERGE, NONE} - payload:
old_content,new_content,old_metadata,new_metadata - actor:
actor_type,actor_id - model provenance:
provider,model,prompt_version - reasoning metadata:
decision_confidence,decision_reason - traceability:
request_id,session_id,created_at
6.3 memory_jobs table (durable queue)
Purpose: retries, crash-safe processing, and backpressure control.
Fields:
- identity:
id,memory_id - type:
job_typein {extract, decision, graph, reembed} - payload:
payload_json - state:
statusin {pending, leased, retry_scheduled, done, dead} - retries:
attempt_count,max_attempts,next_attempt_at - lease:
lease_owner,lease_expires_at - diagnostics:
last_error,last_error_code - timestamps:
created_at,updated_at
6.4 Graph tables
entities:
id,name,canonical_name,entity_type,mentions,embedding,created_at,updated_at
relations:
id,source_entity_id,target_entity_id,relationship,mentions,confidence,created_at,updated_at
memory_entity_mentions (required link table):
id,memory_id,entity_id,mention_text,confidence,created_at
Without memory_entity_mentions, graph-augmented retrieval cannot map
entity expansion back to memory rows reliably.
7) Migration and Rollback Strategy
7.1 Pre-migration safety
- Create timestamped SQLite backup.
- Verify backup integrity.
- Run schema detection and emit migration plan report.
7.2 Migration behavior
- Use additive migrations first (new columns/tables/indexes).
- Backfill derived fields in batched jobs (
content_hash,normalized_content,embedding_model, etc.). - Never block startup on full backfill; process in background queue.
7.3 Rollback
- Rollback mechanism is DB restore from pre-migration backup.
- Keep migration execution idempotent so re-run is safe.
- Store migration audit record with start/end time and outcome.
8) Concurrency and Transaction Model
8.1 Transaction boundaries
Use short transactions only:
- Tx A: ingest write (insert raw memory envelope + queue job)
- Tx B: apply one semantic decision atomically
- Tx C: finalize metadata/access/history updates
No LLM or embedding call may execute while holding write lock.
8.2 Race prevention
- DB-level uniqueness on
content_hashfor exact duplicate collapse. - Compare-and-set update using memory
versionto prevent stale write. - Re-check candidate state immediately before applying UPDATE/DELETE.
8.3 Connection policy
- Single shared DB accessor in daemon process.
- Read-only handles for search paths where possible.
- Standard pragmas on write connections:
- WAL mode
- busy timeout
- synchronous NORMAL
- memory temp store
9) Provider Abstraction and Policy
9.1 Supported providers
- Local Ollama (default)
- API providers (OpenAI/Anthropic)
- Harness passthrough (Claude Code/OpenCode/OpenClaw)
9.2 Capability contract
Each provider must declare:
- extraction support
- decision support
- optional reranking support
- timeout and token constraints
- structured output reliability level
9.3 Fallback order
- Preferred provider
- Secondary provider (optional)
- Raw-save + async retry queue
If all provider calls fail, memory remains stored with
extraction_status=unprocessed and is retried.
10) Privacy and Security Requirements
10.1 Default posture
- Local-first processing is default.
- Remote providers are opt-in.
10.2 Data handling controls
Before remote inference:
- redact obvious secret patterns (tokens, API keys, private keys)
- redact configured sensitive terms
- preserve reversible placeholder map locally for audit
10.3 Governance controls
- provider allowlist in config
- explicit
local_only=trueenforcement mode - outbound inference logs must avoid storing raw redacted content
10.4 Safety invariants
pinned=1cannot be deleted by model output.- soft-delete only; hard purge only via retention worker.
- all mutating decisions require history event insert in same commit.
11) Extraction and Decision Contracts
11.1 Extraction output contract
Required structure:
facts[]: content + memory type + confidenceentities[]: source, relationship, target, confidencewarnings[]: optional parser/quality issues
Validation rules:
- reject empty or trivial facts
- enforce max fact length
- enforce max output count per request
- schema validation failure triggers raw-save fallback
11.2 Decision output contract
For each candidate memory, model returns one of:
- ADD
- UPDATE
- DELETE
- NONE
Additional fields required:
- target temp id (for UPDATE/DELETE/NONE)
- confidence
- short reason
Invalid target ids or malformed decisions are discarded and recorded as pipeline warnings.
11.3 Contradiction handling
If same batch contains opposing claims for same subject:
- block automatic destructive decision
- store both with contradiction marker
- emit review-needed history event
12) Search and Ranking Specification
12.1 Baseline retrieval
- hybrid vector + BM25 search remains primary path
- excluded from retrieval:
is_deleted=1unless explicit include flag
12.2 Score components
final_score = a*vector + b*bm25 + c*access + d*graph (+ optional rerank)
Where:
accessis logarithmic frequency boostgraphis one-hop connectivity boost- reranker is opt-in and latency-bounded
12.3 Reranking policy
- disabled by default
- only applied to top-N candidates
- bounded timeout; fallback to pre-rerank order on timeout/failure
13) Graph Memory Specification
13.1 Extraction behavior
Entity and relation extraction runs in same pipeline pass as fact extraction, but persisted independently.
13.2 Merge behavior
- entity merge via semantic threshold + canonical name normalization
- relation merge by
(source, relationship, target)tuple - mention counts increment, not duplicate row insertion
13.3 Query-time usage
- extract query entities
- resolve nearest stored entities
- expand one-hop neighbors
- boost linked memories via
memory_entity_mentions
If no entities resolve, search behaves exactly like baseline hybrid.
14) API Specification (Delta)
14.1 POST /api/memory/remember
New request fields:
mode:auto | sync | asyncraw: boolean (bypass extraction)idempotency_key: optional client keypipeline_timeout_ms: optional override within safe bounds
Response includes:
memory_idstatus:processed | queued | raw_savedjob_idwhen queuedwarnings[]
14.2 GET /api/memory/jobs/:id
Returns job state, attempts, next retry, and last error summary.
14.3 GET /api/memory/:id/history
Returns ordered event log with model/provider provenance and reasons.
14.4 POST /api/memory/:id/recover
Recovers soft-deleted memory if still in retention window.
14.5 PATCH /api/memory/:id
Explicit modify endpoint for agent/operator corrections.
Request supports:
content(optional)type,tags,importance,pinned(optional metadata updates)reason(required; stored in history)if_version(optional optimistic concurrency guard)
Behavior:
- Fails on version mismatch when
if_versionis provided. - Updates embedding if content changes.
- Writes
UPDATEevent tomemory_historyin same commit.
14.6 DELETE /api/memory/:id
Explicit forget endpoint (soft delete).
Request/query supports:
reason(required)force(optional; required for pinned memory)
Behavior:
- Default is soft-delete (
is_deleted=1,deleted_at=now). - Rejects pinned delete unless
force=trueand policy permits caller. - Writes
DELETEevent tomemory_history.
14.7 POST /api/memory/forget
Batch forget by query and filters (agent-safe forget flow).
Request supports:
query(semantic or keyword)- optional filters (
type,tags,who,source_type, time range) mode:preview | executelimitreason(required for execute)
Behavior:
previewreturns candidate memory IDs and scores only.executeapplies soft-delete to selected IDs.- For large deletes above threshold, require explicit confirm token.
14.8 POST /api/memory/modify
Batch modify operation for structured edits.
Request supports list of patches:
{ id, content?, tags?, type?, importance?, reason, if_version? }
Behavior:
- Atomic per item, not all-or-nothing across batch.
- Per-item result includes success/failure + conflict reason.
- Each successful item writes
UPDATEhistory event.
14.9 Compatibility
Legacy aliases remain functional, mapped to the new pipeline behavior.
15) Configuration Specification
15.1 Pipeline config block
Config supports:
- provider selection and model per provider
- mode defaults (
auto/sync/async) - timeout and retry budgets
- dedup thresholds
- reranker enablement
- graph enablement
local_onlyprivacy enforcement- mutation policy (allow/deny delete of pinned, max batch delete size)
- forget safeguards (preview required, confirm threshold)
15.2 Safe defaults
- provider: local Ollama
- mode: auto
- reranker: off
- graph boost: low weight
- remote redaction: on
16) Observability and Operations
16.1 Required metrics
Pipeline throughput and reliability:
- remembers total, processed, queued, fallback raw saves
- extraction success rate
- decision parse failure rate
- queue depth, queue age p95, dead-letter count
- retry distribution by attempt
Latency:
- remember end-to-end p50/p95/p99 by mode
- extraction stage latency
- decision stage latency
- search latency p50/p95/p99
Quality indicators:
- duplicate suppression rate
- contradiction flag rate
- update-vs-add ratio
Cost/usage:
- tokens in/out per provider
- estimated cost per 1k remembers
Storage/DB:
- DB file size growth
- sqlite busy/lock error rate
16.2 Logging requirements
Structured logs must include request id, memory id, job id, stage, provider, model, latency, and outcome.
16.3 Alert thresholds
Minimum alerts:
- dead-letter rate > 1% over 15m
- queue age p95 > 5m
- remember p95 > SLO for 30m
- sqlite busy errors above baseline threshold
16.4 Autonomous maintenance plane (self-healing)
The system must expose machine-actionable diagnostics and safe repair actions so agents can maintain memory health.
Required health signals (read-only):
- queue health (
depth,oldest_age,dead_rate, lease anomalies) - storage health (DB size growth, fragmentation, WAL growth)
- index health (FTS/vec consistency and freshness)
- model/provider health (availability, timeout rate, parse failure rate)
- mutation safety health (wrong-target rate, rollback/recover success)
Required repair actions (mutating, policy-gated):
- requeue dead/retryable jobs
- release stale leases
- reindex FTS and vector linkage consistency checks
- reembed subset by model/version drift
- run retention/purge jobs in safe order
- execute targeted rollback/recover for recent destructive mutations
Control constraints:
- all repair actions require reason + actor + correlation id
- every repair action writes an audit event
- per-action rate limits and cooldown windows
- policy-level allowlist of autonomous actions
- emergency kill switch to disable all autonomous mutations
17) Rollout Plan
Phase A: Infrastructure hardening
- schema additions, queue table, history table, new indexes
- transaction boundary refactor
- DB connection access unification
Gate: no regression in existing remember/recall behavior.
Phase B: Shadow extraction
- run extraction + decision in shadow mode
- do not mutate memory semantics yet; only log proposed actions
Gate: acceptable parse reliability and decision quality on real traffic.
Phase C: Controlled writes
- enable ADD/NONE decisions
- keep UPDATE/DELETE behind feature flag
Gate: low duplicate growth, no data-loss incidents.
Phase D: Full semantic decisions
- enable UPDATE/DELETE with safety invariants
- enable recover endpoint and retention worker
Gate: pinned protection and recoverability validated.
Phase E: Graph and optional reranking
- enable graph extraction + low-weight graph boost
- reranker optional per config
Gate: retrieval quality improvement without latency SLO breach.
Phase F: Autonomous maintenance enablement
- expose diagnostics endpoints and health score
- ship policy-gated repair action endpoints
- enable agent maintenance loop in observe-only mode first
- graduate to execute mode after guardrail gates pass
Gate: self-healing actions improve system health metrics without increasing accidental mutation incidents.
Phase G: OpenClaw plugin-first runtime migration
- define canonical runtime path as
@signetai/adapter-openclaw - keep
@signet/connector-openclawas install/bootstrap only - add runtime operations for explicit modify/forget and full lifecycle parity with daemon hook surface
- keep legacy command-hook files as compatibility fallback behind config gate
- enforce single active runtime path per session (plugin or legacy, not both)
Gate: plugin path reaches functional parity and no duplicate capture/recall occurs when compatibility mode is enabled.
18) Validation and Test Strategy
18.1 Unit validation
- normalization, trivial-content filters, parser validation
- contradiction detector
- decision application safety checks
18.2 Integration validation
- remember -> queue -> process -> recall end-to-end
- soft-delete and recovery behavior
- schema migration on representative legacy DB snapshots
18.3 Concurrency validation
- simultaneous identical remembers
- simultaneous conflicting updates
- worker lease contention and lease expiry recovery
18.4 Fault injection
- provider timeout
- malformed model output
- temporary DB lock contention
- process restart during leased job
18.5 Self-healing validation
- inject queue stalls and verify autonomous requeue + lease recovery
- inject index drift and verify autonomous consistency repair
- inject provider outage and verify degrade/recover without data loss
- inject accidental forget in canary and verify recover workflow
- verify kill switch disables autonomous mutations immediately
18.6 OpenClaw integration validation
- plugin lifecycle path validates session-start, user-prompt-submit, session-end, and compaction callbacks
- plugin tools validate search/store/get/list/forget/modify contracts
- compatibility mode validates legacy command hooks continue to function
- mixed-mode tests verify duplicate recall/capture protection
- daemon outage tests verify graceful degradation and recovery behavior
19) Success Metrics
19.1 Quality metrics
- Recall@5 and Recall@10 on labeled recall set.
- nDCG@10 for ranking quality.
- Duplicate creation rate per 1,000 remembers.
- Decision precision/recall for ADD/UPDATE/DELETE against human labels.
- Contradiction handling precision (flagged contradictions that are genuinely conflicting).
- Modify precision: percent of explicit edits applied to intended IDs.
- Forget precision: percent of forget executions deleting only intended IDs.
Target deltas vs baseline:
- Recall@10: +15% relative minimum
- nDCG@10: +10% relative minimum
- duplicate rate: -60% relative minimum
- decision precision: >= 0.90 for UPDATE/DELETE in canary
19.2 Reliability metrics
- End-to-end remember success rate >= 99.9% (raw-save counts as success).
- Queue dead-letter rate <= 0.5% daily.
- Pinned-delete incidents = 0.
- Recovery success for soft-deleted memories >= 99% within retention window.
- Accidental deletion incidents = 0 in canary and GA.
19.3 Latency metrics
For local default provider:
- remember p95 (auto mode): <= 1.2s
- remember p99 (auto mode): <= 2.5s
- fallback raw-save p95 under provider outage: <= 200ms
- recall/search p95: <= 400ms without reranker
- explicit modify p95: <= 300ms (single ID)
- explicit forget p95: <= 250ms (single ID)
19.4 Cost and efficiency metrics
- Median tokens per remember decision path.
- Estimated provider cost per 1,000 remembers.
- Queue processing throughput (jobs/minute) at target concurrency.
- Storage growth per 10,000 remembers.
19.5 Self-healing effectiveness metrics
- Mean time to detect (MTTD) and mean time to recover (MTTR) for queue, provider, and index incidents.
- Autonomous remediation success rate by action type.
- Percent of incidents resolved without human intervention.
- False-remediation rate (action taken but no health improvement).
- Safety incident rate attributable to autonomous actions.
Targets:
- autonomous resolution >= 80% of sev-3 memory incidents
- MTTR improvement >= 50% vs non-autonomous baseline
- false-remediation rate <= 5%
- autonomous-action safety incidents = 0
19.6 OpenClaw integration metrics
- Plugin adoption rate vs legacy command-hook path.
- Duplicate recall/capture incident rate.
- Plugin callback success rate by lifecycle event.
- Tool success rate for explicit modify/forget operations.
- OpenClaw-to-daemon roundtrip latency p95 for hook/tool calls.
Targets:
- plugin adoption >= 90% before deprecating legacy path
- duplicate recall/capture incidents = 0
- callback and tool success rates >= 99%
20) Benchmarking Methodology
20.1 Benchmark datasets
Build three datasets:
- Real anonymized remember/recall pairs from active usage.
- Synthetic gold dataset with known facts, updates, and contradictions.
- Adversarial dataset (ambiguous phrasing, negation, noisy formatting, trivial chatter, malformed JSON-like input).
Build one additional mutation dataset:
- Modify/forget intent dataset with gold target IDs, including near-duplicate memories and pinned-memory edge cases.
Each dataset must include expected retrieval labels and expected update actions for scoring.
20.2 Offline quality benchmark
Method:
- Run baseline pipeline and v2 pipeline on identical dataset snapshots.
- Freeze embedding model and extraction model versions for run comparability.
- Score Recall@k, nDCG, duplicate rate, decision precision/recall.
- Score modify precision and forget precision on mutation dataset.
Output:
- per-dataset report
- aggregate weighted score
- regression diff report by metric and error category
20.3 Online canary benchmark
Method:
- Route small traffic slice to v2 with feature flags.
- Keep baseline as control.
- Compare latency, queue stability, and user-visible recall quality.
- Compare modify/forget safety metrics (false delete, wrong-target edit).
Guardrails:
- automatic rollback if latency or error thresholds breach for fixed window.
20.4 Load and stress benchmark
Scenarios:
- sustained remember traffic
- burst remember traffic
- mixed remember+recall traffic
- provider degraded/unavailable periods
Measurements:
- throughput (RPS/jobs-per-minute)
- queue lag
- sqlite busy rates
- p95/p99 latency
- wrong-target modify rate
- wrong-target forget rate
20.5 Resilience benchmark
Inject failures:
- daemon restart during leased jobs
- provider timeout spikes
- malformed model responses
- temporary file lock contention
Pass criteria:
- no memory loss
- eventual job completion or dead-letter with audit trail
- recovery path functional
- no unrecoverable delete from explicit forget endpoints
20.6 Cost benchmark
Run controlled 10k remember workload per provider mode and collect:
- tokens consumed
- estimated spend
- median and p95 processing latency
- quality score deltas vs local provider
20.7 Self-healing benchmark
Method:
- Run deterministic failure scenarios (queue stall, lease leak, provider outage, index drift, rollback requirement).
- Compare control (alerts only) vs autonomous maintenance mode.
- Measure MTTD, MTTR, successful repairs, and safety outcomes.
Pass criteria:
- autonomous mode meets MTTR and safety targets from section 19.5
- no autonomous action bypasses policy controls
- complete audit chain exists for every autonomous action
20.8 OpenClaw runtime benchmark
Method:
- Run the same scripted conversations through:
- plugin-first runtime path (
@signetai/adapter-openclaw) - legacy command-hook compatibility path
- plugin-first runtime path (
- Measure recall quality parity, capture quality parity, and operation latency.
- Validate explicit modify/forget behavior and audit events.
Pass criteria:
- plugin path is non-inferior on recall/capture quality
- plugin path meets latency and reliability SLOs
- no duplicate actions in compatibility or mixed-mode tests
21) Risks and Mitigations
- Over-locking from naive transaction scope
- Mitigation: strict short transactions + compare-and-set writes.
- Model hallucinated updates/deletes
- Mitigation: pinned hard-block, confidence thresholds, recoverability, history audit.
- Queue backlog under provider degradation
- Mitigation: mode fallback, retry backoff, queue alerts, dead-letter.
- Schema drift across installations
- Mitigation: robust schema detection + additive migration + backup.
- Privacy leakage to remote providers
- Mitigation: local default, redaction, provider allowlist, local-only enforcement.
22) Failure Modes and Guardrails
Each guardrail below is required and has a release gate.
22.1 Unauthorized modify/forget
- Required controls:
- caller identity must be resolved for every mutate request
- role-based policy for
remember,modify,forget,recover - actor identity must be written to history for every mutation
- Release gate:
- policy tests prove unauthorized requests are denied and authorized requests succeed with correct audit actor
22.2 Mass forget blast radius
- Required controls:
- mandatory
previewmode for query forget - max-delete threshold per request
- confirm token required above threshold
- optional dry-run-only policy mode for canary
- mandatory
- Release gate:
- simulated broad forget cannot execute without preview + confirmation, and threshold policy blocks oversized requests
22.3 Wrong-target modify/forget
- Required controls:
- strict ID-based execution after preview selection
- optimistic concurrency via
if_version - per-item result reporting for batch operations
- Release gate:
- mutation benchmark meets wrong-target thresholds for edit/delete
22.4 Human edit override erosion
- Required controls:
manual_overridelock with configurable TTL- inferred LLM updates blocked during lock window unless forced by explicit operator intent
- Release gate:
- tests verify inferred updates do not overwrite locked memories
22.5 Cross-tenant or cross-scope memory leakage
- Required controls:
- mandatory filter scoping (
user_id/agent_id/run_id) on all search/mutation paths - reject unscoped destructive operations unless explicit admin policy
- mandatory filter scoping (
- Release gate:
- tenant isolation tests show no cross-scope read/write/delete leakage
22.6 Prompt injection via stored memory content
- Required controls:
- memory text treated as untrusted input in prompts
- prompt templates isolate instructions from content blocks
- strip or neutralize high-risk control tokens in model context
- Release gate:
- adversarial prompt-injection test suite passes with no policy bypass
22.7 Retry duplication and non-idempotent mutations
- Required controls:
- idempotency keys for explicit mutate endpoints
- compare-and-set updates by version
- dedup key for queued jobs to prevent duplicate processing
- Release gate:
- retry/replay tests show no duplicate UPDATE/DELETE side effects
22.8 Graph drift after modify/forget
- Required controls:
- modify/delete updates
memory_entity_mentionsand relation counts - periodic graph consistency checker job
- modify/delete updates
- Release gate:
- consistency checks show no orphan links after mutation workloads
22.9 Tombstone/history retention gaps
- Required controls:
- separate retention policies for active memory, tombstones, history, and jobs
- purge worker must run in safe order (links -> tombstones -> history)
- Release gate:
- retention tests prove recoverability within SLA and clean purge after expiry
22.10 Low-quality eval labels
- Required controls:
- human-labeled gold set for modify/forget with adjudication
- inter-rater agreement threshold before benchmark acceptance
- Release gate:
- benchmark report includes label quality metrics and passes minimum agreement requirement
22.11 Operational recovery failure
- Required controls:
- tested backup/restore runbook
- tested queue replay from crash state
- tested rollback trigger and procedure
- Release gate:
- game-day drill passes with documented timings and no data loss
22.12 Abuse and anomaly spikes
- Required controls:
- anomaly alerts for delete spikes and wrong-target mutation spikes
- per-caller rate limits for forget/modify
- emergency mutation freeze switch
- Release gate:
- staged abuse simulation triggers alerts and freeze switch behaves as expected
22.13 Runaway or unsafe autonomous remediation
- Required controls:
- bounded action budgets per hour/day and per incident
- mandatory health re-check after each repair step
- automatic halt after repeated ineffective repairs
- human escalation path with full state snapshot
- Release gate:
- chaos test proves loop halts safely on non-improving conditions and escalates correctly
22.14 OpenClaw dual-path execution conflicts
- Required controls:
- runtime arbitration key per session that selects plugin or legacy path
- idempotency keys on capture actions across both paths
- clear precedence rules in config and docs
- Release gate:
- mixed-mode tests prove no duplicate remember/recall/capture actions
23) Implementation Deliverables
- Updated schema and migration set with rollback runbook.
- Durable memory job queue + worker loop semantics.
- Extraction/decision pipeline integrated with remember endpoint.
- Explicit modify/forget API + history and recovery endpoints.
- Graph extraction storage and graph-boosted retrieval.
- Metrics, dashboards, and alert rules.
- Benchmark harness + baseline and canary report templates.
- Agent maintenance plane (diagnostics + repair actions + policy engine).
- Autonomous maintenance benchmark suite and runbooks.
- Operator docs for config, rollout, failure handling, and escalation.
- OpenClaw plugin-first runtime integration with legacy fallback mode.
- OpenClaw migration and deprecation runbook for command-hook path.
24) Final Acceptance Checklist
All must pass:
- schema migration dry run and live run tested
- queue recovery tested across restart
- pinned deletion prevention verified
- soft-delete retention + recover verified
- explicit modify endpoint concurrency guard tested
- explicit forget preview/execute safeguards tested
- offline benchmark meets quality targets
- canary metrics meet latency/reliability targets
- autonomous maintenance gates meet section 19.5 targets
- kill switch and escalation workflows tested end-to-end
- OpenClaw plugin path passes parity and no-duplication gates
- rollout and rollback procedures documented and exercised
25) Immediate Next Steps
- Approve this spec as implementation contract.
- Create phase-level tickets (A through G) with explicit owners.
- Capture baseline benchmark snapshot before first code change.
- Start Phase A behind feature flags, then proceed by gates.
- Define autonomous maintenance action policy and escalation matrix.
- Finalize OpenClaw plugin-vs-legacy arbitration policy and migration timeline.
26) OpenClaw Plugin-First Integration Specification
26.1 Scope and intent
This section defines how OpenClaw wiring fits into the memory pipeline implementation so runtime behavior, safety, and observability remain consistent with the rest of this spec.
26.2 Component responsibilities
@signet/connector-openclaw(install/bootstrap):- patch OpenClaw config entries
- install compatibility hook files
- no long-term ownership of runtime memory policy
@signetai/adapter-openclaw(runtime integration):- lifecycle callbacks to daemon hook endpoints
- runtime tool surface for memory operations
- primary path for remember/recall/modify/forget in OpenClaw
- Signet daemon:
- single source of truth for memory semantics, policy, and audit
- endpoint contract owner for hooks and memory APIs
26.3 Runtime path selection
One path must be active per OpenClaw session:
plugin(preferred)legacy-hook(compatibility)
Selection rules:
- If plugin capability is present and healthy, use plugin.
- If plugin capability is absent, fall back to legacy-hook.
- If both are configured, arbitration enforces one active path and logs the decision.
26.4 Required OpenClaw runtime capabilities
- Lifecycle callbacks:
- session start
- user prompt submit
- session end
- pre-compaction
- compaction complete
- Tool operations:
- memory_search, memory_store, memory_get, memory_list
- memory_modify, memory_forget
- Error handling:
- graceful daemon timeout behavior
- explicit user-visible error summaries
- retry only when idempotency is guaranteed
26.5 Safety and consistency requirements
- No duplicate capture/recall per event.
- Explicit modify/forget must route through the same policy checks and history writes as non-OpenClaw callers.
- Plugin and legacy compatibility paths must emit equivalent audit
fields (
actor,session,request,path).
26.6 Deprecation policy for legacy command-hook path
- Keep compatibility path until plugin adoption and reliability targets are met (section 19.6).
- Announce deprecation window with migration instructions.
- Remove legacy path only after canary and GA windows complete without duplicate-action incidents.
27) Locked Implementation Decisions
These decisions are approved defaults for implementation unless explicitly superseded by a future revision.
27.1 OpenClaw legacy fallback sunset
- Keep legacy command-hook fallback for at least 90 days after plugin-first GA.
- Legacy removal is blocked until section 19.6 targets are met:
- plugin adoption >= 90%
- duplicate recall/capture incidents = 0
- callback and tool success rates >= 99%
27.2 Force-delete authority for pinned memories
- Force-delete of pinned memories is operator-only by default.
- Autonomous agents are denied force-delete unless a future policy explicitly enables it.
- Force-delete requires reason, preview/confirmation flow, and audit event with actor identity.
27.3 Privacy default
- New installs default to
local_only = truefor memory pipeline LLM processing. - Remote providers remain opt-in.
- Existing installs preserve current behavior unless users opt into stricter privacy mode.
27.4 Retention defaults
Default retention windows:
- Soft-deleted memories (tombstones): 30 days.
- Memory history events: 180 days.
- Completed jobs: 14 days.
- Dead-letter jobs: 30 days.
These values are configurable and may be tightened by policy.
Privacy-sensitive install policy:
- History retention must be explicitly chosen during setup (no implicit default).
- Recommended choices are 90 or 180 days based on operator risk posture.
27.5 Autonomous maintenance bounds (GA)
Allowed unattended actions:
- Requeue retryable jobs and release stale leases.
- Run index/consistency checks and non-destructive repairs.
- Run bounded re-embed jobs for model/version drift.
Human-approval-required actions:
- Bulk forget operations.
- Force-delete or pinned deletions.
- Retention purge overrides and destructive rollback operations.
27.6 OpenClaw migration benchmark bar
Plugin-first migration passes only when all are true:
- Plugin path is non-inferior to legacy path on recall/capture quality.
- Plugin path is equal or better on reliability and latency SLOs.
- Duplicate actions remain zero in mixed-mode and canary windows.
28) Competitive Gap Mapping (Supermemory Comparison)
This section maps current observed product gaps (vs references/supermemory/)
to this implementation plan so prioritization stays explicit.
28.1 Current gap categories (as of this draft)
- Memory lifecycle completeness:
- missing shipped explicit
modify/forget/history/recoverendpoints - limited mutation safety tooling in GA path today
- missing shipped explicit
- Ingestion breadth:
- limited native ingest surface compared to URL/file/media/connectors
- Ecosystem integrations:
- thinner framework middleware and SDK ergonomics
- MCP distribution posture:
- less standardized hosted OAuth/API-key style MCP path
- Product analytics:
- less complete usage/error analytics for operator workflows
28.2 What v2 phases close directly
- Phases A-D close most memory lifecycle and safety gaps:
- durable queue + retries
- explicit modify/forget + history + recover
- pinned safety invariants and mutation policy controls
- Phase E closes graph-aware retrieval quality gaps.
- Phase F closes self-healing and operational rigor gaps.
- Phase G closes OpenClaw runtime parity and duplicate-path safety gaps.
28.3 What remains after v2 (out of scope for A-G)
- Broad connector catalog and sync orchestration.
- Framework-native middleware/tooling parity across major app stacks.
- Hosted multi-tenant analytics and scoped auth experience.
29) Post-v2 Expansion Phases (H-K)
These phases begin only after section 24 acceptance criteria pass.
Phase H: Ingestion and Connector Foundation (local-first first)
- Add first-class document ingest contracts for text/URL/file with
asynchronous processing status (
queued/extracting/chunking/embedding/indexing/done/failed). - Introduce connector runtime interface and provider contract: auth, incremental sync, replay, idempotency, provenance.
- Ship initial connector set focused on highest leverage: GitHub docs, local filesystem/project docs, and Google Drive.
- Gate: connector imports are idempotent, auditable, and do not violate local-first defaults.
Phase I: SDK and Agent Integration Surface
- Expand
@signet/sdkto typed memory lifecycle APIs aligned with v2 (remember/recall/modify/forget/history/recover/jobs). - Add framework adapters:
- Vercel AI SDK middleware/tool package
- OpenAI SDK helper package (TypeScript first, Python optional)
- Standardize examples and reference templates for memory injection prompts and safe tool calling.
- Gate: integration examples pass conformance tests and benchmark parity with direct daemon API usage.
Phase J: Auth Scope and Deployment Modes
- Keep localhost/no-auth as default path.
- Add optional authenticated mode for remote/team deployments: scoped API tokens, role policy, and operation-level authorization.
- Add project/agent/user scoping guarantees for read/write/mutate operations.
- Gate: tenant/scope isolation tests pass with zero cross-scope leakage.
Phase K: Analytics and Operator UX
- Add memory pipeline analytics endpoints and dashboard cards: throughput, error taxonomy, queue health, latency, mutation safety.
- Include exportable audit and incident timelines.
- Add benchmark report publishing workflow for canary/GA decisions.
- Gate: operators can diagnose top memory failures without ad-hoc log spelunking.
30) Prioritization Guardrails for H-K
- Do not expand connector breadth before v2 mutation safety gates are green.
- Favor depth over breadth: one production-grade connector beats five partial connectors.
- Preserve local-first defaults in every new surface.
- Any destructive operation must remain previewable, reversible, and fully audited.
- New integration layers must reuse daemon contracts rather than invent parallel semantics.
31) Immediate Planning Actions (next cycle)
- Convert Phases A-G into owner-assigned milestones with explicit acceptance tests per gate.
- Freeze and capture baseline benchmark snapshot before A starts.
- Draft Phase H RFC in parallel (contracts only, no implementation) so post-v2 execution can start without discovery lag.
- Scope
@signet/sdkv2 API surface in parallel with D/E so Phase I is not blocked on SDK design. - Define deployment mode policy matrix now (localhost default, authenticated optional) to de-risk Phase J.
32) Deep Implementation Plan (A-G)
This section converts phases into concrete implementation tracks, component ownership, and sequencing constraints.
32.1 Package ownership map
@signet/coreowns:- schema and migrations API surface
- memory domain types and validators
- extraction/decision contracts
- retrieval scoring and graph boost logic
@signet/daemonowns:- queue worker runtime and job leasing
- API handlers and authorization policy checks
- observability endpoints and health diagnostics
@signet/sdkowns:- typed client surface for all memory lifecycle APIs
- integration-safe wrappers and transport defaults
@signetai/adapter-openclawowns runtime plugin behavior.@signet/connector-openclawowns install/bootstrap and fallback hooks.
32.2 Phase A implementation tracks (infrastructure hardening)
Track A1: Schema and migration runtime
- Add migrations for:
memoriesnew columns (content_hash,normalized_content,is_deleted,deleted_at,pinned,importance,extraction_status, provenance fields)memory_historymemory_jobs- graph tables (
entities,relations,memory_entity_mentions)
- Add migration audit table:
schema_migrations_audit(id, started_at, ended_at, outcome, details)
- Add preflight command/API:
- detect schema state
- produce additive migration plan report
- verify backup existence + checksum
Track A2: DB access and transaction boundaries
- Introduce single daemon DB accessor factory with:
- one write handle
- pooled read handles when beneficial
- Implement explicit transaction wrappers:
txIngestEnvelope()txApplyDecision()txFinalizeAccessAndHistory()
- Enforce no provider call inside write transaction via lintable guard:
- central helper requiring pure DB closures
Track A3: Feature flags and kill switches
- Add config keys:
memory.pipelineV2.enabledmemory.pipelineV2.shadowModememory.pipelineV2.allowUpdateDeletememory.pipelineV2.graph.enabledmemory.pipelineV2.autonomous.enabled
- Add emergency freeze controls:
memory.pipelineV2.mutationsFrozenmemory.pipelineV2.autonomousFrozen
Track A4: Baseline compatibility
- Keep legacy remember/recall behavior default-on until gates pass.
- Add dual-read diagnostics mode:
- compare baseline vs v2 candidate retrieval silently
- log divergence metrics only
Exit criteria for Phase A (must all pass)
- Additive migration succeeds on representative legacy DB snapshots.
- Daemon restarts safely during migration/backfill.
- No regression in existing
/remember+/recalluser path.
32.3 Phase B implementation tracks (shadow extraction)
Track B1: Contract-first extractors
- Implement extractor interface:
extractFactsAndEntities(input): ExtractionResult
- Add validator with hard caps:
- max facts
- max entities/relations
- max fact length
- reject trivial or malformed outputs
- Persist warnings even when extraction fails.
Track B2: Shadow decision engine
- Implement candidate retrieval for each fact (top-K hybrid).
- Implement decision contract parser for
ADD/UPDATE/DELETE/NONE. - Persist proposed decisions to history with
event=NONE+decision_reason, without mutating memories.
Track B3: Job worker + retry behavior
- Worker loop with leasing:
- lease timeout
- stale lease reaper
- exponential backoff with jitter
- Job dedup keys:
- one active extract job per memory/version
- Dead-letter path with machine-readable error codes.
Exit criteria for Phase B
- Parse reliability reaches threshold on real traffic sample.
- Shadow decisions produce explainable audit records.
- Queue remains stable under provider timeouts/outages.
32.4 Phase C implementation tracks (controlled writes: ADD/NONE)
Track C1: ADD write path
- Enable ADD application with idempotency checks:
- exact hash duplicate collapse at DB layer
- Write mandatory history event in same commit as memory mutation.
- Update embedding linkage for newly added memory.
Track C2: Contradiction and duplicate suppression
- Run contradiction detector before destructive recommendations.
- When contradiction risk present:
- block destructive write
- emit review-needed marker
Track C3: Read-path confidence controls
- Exclude low-confidence extracted facts from write path by threshold.
- Keep raw envelope available regardless of quality outcome.
Exit criteria for Phase C
- Duplicate growth is materially reduced from baseline.
- No data-loss incidents under repeated retries/restarts.
32.5 Phase D implementation tracks (full semantic decisions)
Track D1: Explicit mutation APIs
- Implement endpoints from section 14.5-14.8.
- Require
reasonfor all mutate operations. - Add
if_versionoptimistic concurrency guards.
Track D2: Soft-delete and recoverability
- Implement soft-delete semantics in all forget flows.
- Implement recover endpoint with retention-window checks.
- Retention worker enforces purge order: links -> tombstones -> history.
Track D3: Pinned and policy guardrails
- Reject pinned delete unless force + policy-authorized actor.
- Deny autonomous force delete by default (locked decision 27.2).
- Record actor identity + request/session correlation for all mutations.
Exit criteria for Phase D
- Wrong-target edit/delete metrics meet section 22 release gates.
- Recover path passes SLA and audit-chain checks.
32.6 Phase E implementation tracks (graph + optional rerank)
Track E1: Graph extraction persistence
- Persist entities/relations with merge policies.
- Maintain
memory_entity_mentionsas mandatory link table. - On modify/delete, update mention counts and orphan cleanup safely.
Track E2: Query-time graph boost
- Query entity extraction and nearest-entity resolution.
- One-hop expansion and score boost integration:
final = a*vector + b*bm25 + c*access + d*graph. - Graceful fallback to baseline hybrid if entity resolution fails.
Track E3: Optional reranker
- Add reranker hook for top-N only.
- Enforce strict timeout with fallback to pre-rerank ordering.
- Instrument latency overhead separately.
Exit criteria for Phase E
- Recall@10 and nDCG targets met without latency SLO breach.
- Graph consistency checks show no orphan-link drift.
32.7 Phase F implementation tracks (autonomous maintenance)
Track F1: Diagnostics plane
- Add read-only health endpoints for: queue, storage, index, provider, mutation safety.
- Compute health score with sub-scores per domain.
Track F2: Repair action plane
- Implement policy-gated actions: requeue, lease release, reindex checks, bounded reembed.
- Require reason + actor + correlation id on every action.
- Enforce per-action cooldown and hourly/day action budgets.
Track F3: Safety automation loop
- Observe-only mode first (recommendations, no mutation).
- Promote to execute mode only after canary gates.
- Auto-halt on repeated non-improving remediations.
Exit criteria for Phase F
- MTTR improves while maintaining zero autonomous safety incidents.
- Kill switch proven effective immediately.
32.8 Phase G implementation tracks (OpenClaw plugin-first)
Track G1: Runtime path arbitration
- Session-level arbitration key chooses plugin vs legacy path.
- Emit audit field
runtime_pathfor every memory operation. - Enforce one active path per session.
Track G2: Tool and lifecycle parity
- Ensure plugin supports full lifecycle callbacks.
- Ensure plugin exposes memory tool parity: search/store/get/list/modify/forget.
- Route plugin mutations through same daemon policy engine as all callers.
Track G3: Compatibility de-risking
- Keep legacy hooks gated and measurable.
- Add duplicate-action guard keys across both paths.
- Publish migration + deprecation runbook.
Exit criteria for Phase G
- Plugin path non-inferior quality, equal/better reliability/latency.
- Duplicate actions remain zero in mixed-mode canary.
32.9 Dependency-critical sequencing
- A must complete before any B-F mutating behavior.
- B shadow data should run long enough to calibrate thresholds before C.
- D cannot ship before mutation policy and retention worker are complete.
- E graph boost should be enabled only after D mutation correctness gates.
- F execute mode cannot ship before D+E safety baselines are stable.
- G deprecation timeline starts only after F operational maturity.
33) Deep Implementation Plan (H-K)
These phases focus on product surface parity while preserving local-first and mutation safety guarantees from v2.
33.1 Phase H: Ingestion and connector foundation
H1: Document ingest API and lifecycle
- Introduce document-level model:
documentstable with status machine and provenance- mapping from document -> derived memories
- Proposed ingest endpoints:
POST /api/documents(text/url payload)POST /api/documents/file(file upload)GET /api/documents/:id(status + diagnostics)GET /api/documents/:id/chunks(debug/inspection)
- Status machine:
queued -> extracting -> chunking -> embedding -> indexing -> done|failed.
H2: Connector runtime contract
- Define connector interface:
authorize()listResources()syncIncremental(cursor)syncFull()replay(resourceId, fromVersion)
- Required connector metadata:
- provider, tenant scope, cursor, last sync, rate-limit state, error state, provenance tags.
- Shared connector safeguards:
- idempotency keys by source document hash + provider version
- per-connector backoff and dead-letter queue
- replay-safe mutation behavior
H3: Initial connector set (depth-first)
- GitHub docs connector (text docs only first pass).
- Local filesystem docs connector (watch + snapshot modes).
- Google Drive connector as first OAuth source.
H4: Connector operations
- Connector-specific health diagnostics and forced resync API.
- Manual sync + preview impact report before destructive sync actions.
- Connector runbook for quota/permission/replay failures.
Exit criteria for H
- At least one connector is production-grade with replay + idempotency.
- Document ingest lifecycle is visible and debuggable end-to-end.
33.2 Phase I: SDK and framework integrations
I1: @signet/sdk v2 client
- Add typed methods for: remember, recall, modify, forget, history, recover, jobs, documents.
- Add strong request/response schemas and error classes.
- Add transport policies: retries only for idempotent operations by default.
I2: Framework adapters
- Vercel AI SDK adapter:
- context injection middleware
- memory tool bindings
- safety defaults for mutation operations
- OpenAI SDK adapter (TS first):
- helper for tool definitions + execution wrappers
- conversation-scoped memory context helpers
- Optional Python adapter follows after TS parity and adoption signal.
I3: Conformance + examples
- Golden integration tests verify adapters match daemon API semantics.
- Reference examples for chat agents and coding agents.
- Include “safe defaults” templates: preview-first forget, version-guarded modify.
Exit criteria for I
- Adapter flows show non-inferior recall quality vs direct API calls.
- Mutation safety invariants preserved through all adapter layers.
33.3 Phase J: Auth scope and deployment modes
J1: Deployment mode matrix
local-defaultmode:- localhost only
- no mandatory auth
team-authmode (optional):- token auth required
- scoped policy enforcement
hybridmode:- local commands remain frictionless
- remote endpoints require auth and scope
J2: Token and policy model
- Token classes:
- personal full-scope token
- scoped token (project/agent-restricted)
- short-lived session token (optional)
- v1 scope granularity includes project + agent + user dimensions.
- Permission matrix by operation: remember/recall/modify/forget/recover/admin.
- Mandatory scope filters in all query and mutation paths.
J3: Security and abuse controls
- Rate limits for destructive operations.
- Threshold confirmation for bulk forget and force delete.
- Full audit provenance for caller identity and role.
Exit criteria for J
- Isolation tests prove no cross-scope leakage.
- Unauthorized mutation tests pass under all deployment modes.
33.4 Phase K: Analytics and operator UX
K1: Analytics data model
- Usage counters by endpoint, actor, provider, connector.
- Error taxonomy with stage-level codes.
- Latency histograms for remember/recall/mutate/jobs.
K2: API endpoints and dashboard
- Proposed endpoints:
GET /api/analytics/usageGET /api/analytics/errorsGET /api/analytics/logsGET /api/analytics/memory-safety
- Dashboard cards: queue health, dead-letter trend, mutation safety, provider status, connector sync health.
K3: Operator incident workflow
- One-click export of timeline for a request/session/memory id.
- Include policy decisions and automated actions in timeline.
- Add benchmark report publication to release decision checklist.
Exit criteria for K
- Top incident classes diagnosable from dashboard + APIs alone.
- Canary/GA decisions use standard published benchmark artifacts.
34) Detailed Milestone Graph and Suggested Sprinting
Assumes 2-week iterations and parallel work where safe.
34.1 Milestone groups
- M1 (A1-A3): Schema + DB access + flags
- M2 (B1-B3): Shadow extraction + queue stabilization
- M3 (C1-C3): Controlled ADD writes + contradiction controls
- M4 (D1-D3): Explicit mutation APIs + recovery + policy hardening
- M5 (E1-E3): Graph persistence + graph boost + optional rerank
- M6 (F1-F3): Diagnostics + repair actions + autonomous observe mode
- M7 (G1-G3): OpenClaw plugin-first arbitration and parity
- M8 (H1-H4): Document ingest + first production-grade connector
- M9 (I1-I3): SDK v2 + framework adapters + conformance
- M10 (J1-J3 + K1-K3): deployment auth modes + analytics UX
34.2 Parallelism guidance
- M1 and benchmark harness prep can run together.
- M2 shadow mode and API contract stubs for D can run together.
- M5 graph extraction implementation can start during late M4, but feature-flagged off until D gates pass.
- M8 RFC and interface contracts can start during M6/M7.
- M9 SDK surface design should start before M8 completes.
34.3 Suggested de-risking order
- Prioritize mutation correctness before connector breadth.
- Prioritize policy correctness before authenticated deployment.
- Prioritize observability before autonomous execute mode.
35) Engineering Findings and Recommendations
- The strongest competitive leverage is not connector count first; it is trusted memory correctness under mutation and recovery.
- Shipping explicit modify/forget/history/recover (D) is the highest product-perception unlock for parity.
- Graph quality work (E) should stay conservative until mutation safety error rates are well-characterized.
- OpenClaw plugin-first migration (G) is strategic for ecosystem coherence; treat duplicate-action prevention as a hard invariant.
- Post-v2 connector work (H) should be depth-first with one excellent connector before expansion.
- SDK/integration surfaces (I) need conformance tests so wrappers never drift from daemon policy semantics.
- Optional auth modes (J) should not compromise local-default UX.
- Analytics (K) should be considered a reliability feature, not just UX.
36) Operator Interview Outcomes (resolved)
The following implementation decisions were confirmed by operator input and now act as planning defaults.
- Phase H OAuth connector priority:
- Google Drive first.
- Phase J auth scope model for v1:
- include project + agent + user scoping in first authenticated mode.
- Phase I adapter scope:
- defer Python adapter from wave 1; prioritize TypeScript adapter quality and adoption signal.
- Privacy-focused retention policy:
- history retention must be config-required at setup.
- Analytics redaction default:
- configurable at setup with safe presets (not one hard global mode).
- Release-blocking metric priority:
- wrong-target mutation precision takes precedence over latency p95.
36.1 Plan implications from interview
- Phase J complexity increases; allocate extra validation budget for scope-composition and isolation tests.
- Setup UX must include retention + analytics redaction decisions.
- Phase I timeline should assume TS adapter is productionized before Python starts.
- Canary gates should halt rollout on mutation precision regressions even when latency remains within SLO.