Agent Relevance Feedback — Spec Addendum
Parent spec: predictive-memory-scorer.md
Status: Approved (v1)
Priority: P0 (training signal quality)
Problem
The predictive memory scorer needs ground-truth relevance signals to train effectively. Current signals are second-order:
- Continuity scorer — LLM judge at session end, guessing what helped. A model interpreting another model’s behavior.
- FTS overlap — keyword matching between user prompts and injected memories. First-order but shallow — catches explicit references, misses memories that shaped reasoning without being directly referenced.
Neither captures what the agent actually experienced: which memories changed how it responded. The agent knows. We just need to ask.
Without this signal, the predictor trains on weak proxies and risks misalignment before it ever converges.
Design
Core Mechanism
On every userPromptSubmit hook cycle, the daemon asks the agent to
rate the injected memories. The agent responds with lightweight
relevance scores. These accumulate across the session and become the
primary training label at session end.
Hook Integration
The userPromptSubmit hook already injects context (memories,
working memory) into the agent’s context window. We extend this with
a structured feedback request.
Request (appended to hook inject string):
<memory-feedback>
Rate how useful each injected memory was for your last response.
Respond with ONLY a JSON object mapping memory IDs to scores.
Scale: -1 (actively harmful/misleading) to 1 (directly shaped response).
0 = present but not used. Omit memories you can't evaluate.
Example: {"mem_abc": 0.8, "mem_def": -0.2, "mem_ghi": 0}
</memory-feedback>
Response (agent returns in hook response payload):
{
"memory_feedback": {
"mem_abc123": 0.9,
"mem_def456": 0.0,
"mem_ghi789": 0.6
}
}
The daemon parses this from the hook response. If the agent doesn’t include it (older harness, non-supporting model), the field is simply absent — fail-open.
Data Storage
New column on session_memories:
ALTER TABLE session_memories
ADD COLUMN agent_relevance_score REAL;
-- NULL = no feedback received
-- Accumulated: mean of all per-prompt scores for this memory
Per-prompt feedback is accumulated across the session:
// On each userPromptSubmit with feedback:
for (const [memoryId, score] of Object.entries(feedback)) {
// Running mean: (existing_sum + score) / (existing_count + 1)
updateSessionMemory(sessionKey, memoryId, score);
}
Additional column for count:
ALTER TABLE session_memories
ADD COLUMN agent_feedback_count INTEGER DEFAULT 0;
Label Construction (modified)
Agent feedback becomes the primary training signal when available. The hierarchy:
-
Agent relevance score (primary, when available)
- Direct ground truth from the agent that used the memory
- 10-20 data points per session per memory (one per prompt)
- Replaces continuity scorer as primary label
-
FTS overlap (secondary, always available)
- Behavioral confirmation/contradiction
- Used to adjust agent scores or fill gaps:
- Memory scored 0 by agent but matched by FTS 2x: bump to 0.3 (agent may not have noticed its influence)
- Memory scored 0.8 by agent but never FTS-matched: keep 0.8 (agent reasoning doesn’t require explicit reference)
-
Continuity scorer (tertiary, session-level)
- Session-level quality signal
- Scales agent scores: high continuity score = trust agent feedback more; low = discount slightly
- Fallback when agent feedback unavailable
Combined label:
if agent_relevance_score is not null:
label = agent_relevance_score * 0.7
+ fts_adjustment * 0.2
+ continuity_modifier * 0.1
else:
// Existing path: continuity + FTS (no change)
label = continuity_label + fts_adjustment
Signal Properties
| Signal | Order | Frequency | Coverage | Reliability |
|---|---|---|---|---|
| Agent feedback | 1st (direct) | Per-prompt (10-20/session) | Injected memories only | High — agent knows |
| FTS overlap | 1st (behavioral) | Per-prompt | All memories (injected + missed) | Medium — keyword-limited |
| Continuity scorer | 2nd (inferred) | Per-session | All memories | Lower — model judging model |
Harness Support
Each connector needs to:
- Include the
<memory-feedback>block inuserPromptSubmitinject - Parse
memory_feedbackfrom the hook response - Forward scores to daemon API
Claude Code: userPromptSubmit hook already returns structured
data. Add memory_feedback as optional field in response.
OpenCode: Plugin userPromptSubmit handler can include feedback
request in system prompt injection.
OpenClaw: Runtime plugin handles hook lifecycle, same pattern.
Connectors that don’t support it yet simply don’t send the field. The daemon handles absence gracefully (null scores, existing label path).
Latency Considerations
- Feedback request adds ~50 tokens to the inject string
- Agent response adds ~20-50 tokens (just a JSON object)
- No additional LLM calls — the agent rates memories as part of its normal response cycle
- Parsing is trivial — JSON.parse on a small object
- DB writes are batched with existing session_memories updates
Cold Start Interaction
During predictor cold start (alpha=1.0), agent feedback still accumulates. This is valuable because:
- Training data builds up before the predictor is active
- When cold start exits, the predictor has real ground truth to train on, not just continuity scorer guesses
- Faster convergence on first training run
Privacy / Safety
- Feedback stays local (daemon SQLite, never sent externally)
- Agent sees only memory IDs it was already given
- Scores are numerical, no content duplication
- User can disable via config:
predictor.agentFeedback: false
Training Telemetry Opt-In
Anonymized training pairs (feature vectors + labels, NO content) can be collected locally for eventual upstream contribution to a shared base model. Opt-in is handled conversationally by the agent:
State machine:
telemetry.trainingDataOptIn:null(unasked) |true|falsetelemetry.lastPromptedAt: ISO timestamptelemetry.installDate: set on first daemon startup
Prompt schedule:
- Only when
optIn === null(never asked or no answer yet) - At most once per 24 hours
- Only during the first 7 days after install
- Injected into session-start hook as a one-time block
- After 7 days with no answer, defaults to
falsesilently
Agent prompt (injected into session start):
<signet-telemetry-prompt>
At a natural moment early in this session, ask the user if they'd
like to opt in to anonymous training telemetry. Signet collects
only numerical feature vectors and relevance scores (never memory
content) to train a shared memory relevance model that ships with
future updates. This makes memory better for everyone. Users can
opt out anytime via `signet telemetry off`. If they decline,
respect it and don't mention it again.
</signet-telemetry-prompt>
User response handling:
- Agent calls
POST /api/telemetry/opt-inwith{ enabled: bool } - Or user runs
signet telemetry on/signet telemetry off - Either path sets
trainingDataOptInpermanently falseis respected permanently — no re-prompting, ever
What is NOT collected:
- Memory text content
- User prompts or agent responses
- File paths or project names
- Any personally identifiable information
What IS collected (when opted in):
- Numerical feature vectors (recency, importance, decay, etc.)
- Numerical relevance labels (agent score, FTS score, continuity)
- Structural metadata (was_injected, rank positions)
- Session-level aggregate stats (candidate count, injection count)
Migration
ALTER TABLE session_memories
ADD COLUMN agent_relevance_score REAL;
ALTER TABLE session_memories
ADD COLUMN agent_feedback_count INTEGER DEFAULT 0;
Config
pipelineV2:
predictor:
agentFeedback: true # Enable agent relevance feedback
feedbackWeight: 0.7 # Weight of agent feedback in label
ftsWeight: 0.2 # Weight of FTS adjustment
continuityWeight: 0.1 # Weight of continuity modifier
Implementation Order
- Migration: add columns to session_memories
- Daemon: parse feedback from userPromptSubmit hook response
- Daemon: accumulate scores in session_memories
- Daemon: modified label construction in summary-worker
- Connectors: add feedback request to inject strings
- Training: pass combined labels to predictor sidecar
Open Questions
- Should the feedback request be every prompt or every Nth prompt to reduce token overhead? (Probably every prompt — 50 tokens is negligible compared to the memory inject itself)
- Should we weight early-session feedback differently from late-session? (Early feedback may be less informed since the agent hasn’t used the memories yet)
- Negative scores (actively harmful memories) — should these trigger immediate removal from context on next prompt?