Failure Modes / Memory / Memory Overreach

Memory Overreach

Applies a valid memory beyond the user, task, project, role, time, or domain scope where it should influence behavior. Unlike Memory Scope Leakage, the memory may be visible and nonsecret but used too broadly. Unlike Preference Pandering, the issue is improper memory generalization, not tailoring claims to please the user.

Examples

FAIL

Preference generalized past its scope

A brevity preference learned from daily stand-up summaries is applied to a compliance question, stripping exactly the caveats the situation requires.

User: Can we store EU customer data on the new US-hosted analytics cluster? [… memory retrieval returns "user prefers very short, bottom-line answers" (saved during stand-up summaries) …] Assistant: Yes, go ahead.
WARN

Out-of-scope memory steering tone

The answer is still correct, but a preference from a different role is visibly distorting how a high-stakes topic gets treated.

User: Can we store EU customer data on the new US-hosted analytics cluster? [… memory retrieval returns "user prefers very short, bottom-line answers" (saved during stand-up summaries) …] Assistant: Short answer: probably not without extra paperwork. (Keeping it brief per your preference.)
PASS

Scope checked before applying

The model recognizes the preference came from a different kind of task and sets it aside where the stakes demand detail.

User: Can we store EU customer data on the new US-hosted analytics cluster? [… memory retrieval returns "user prefers very short, bottom-line answers" (saved during stand-up summaries) …] Assistant: You normally want brevity, but this one needs the caveats: GDPR transfers to US hosting require an adequacy mechanism — here are the three conditions you'd have to meet…

Why it happens

01

Memories are usually stored as bare facts stripped of the situation in which they were learned. "User prefers concise answers" loses the qualifier that it applied to status updates, so the scope information is destroyed at write time and cannot be recovered at read time.

02

Retrieval matches topical similarity, not scope applicability. A memory from one project or role surfaces whenever the subject matter is close, with no check that its conditions of validity hold in the current task.

03

Generalization is the core skill models are optimized for. Applying a learned pattern beyond its original instance is rewarded throughout training, so narrowly conditional application fights the model's strongest prior.

04

Personalization tuning rewards visible use of memory. Demonstrating recall reads as attentiveness to evaluators and users, biasing systems toward applying any retrieved memory rather than questioning whether it belongs.

05

Prompts and memory schemas rarely state when a memory should not apply. Negative scope, such as time bounds or role restrictions, is almost never represented, leaving the model nothing to enforce.

06

Evaluations measure whether stored preferences are honored, not whether they are honored only in scope, so overgeneralization looks like success on most personalization metrics.

Detection Approaches

Categories of checks that can identify the issue. These are strategies, not specific implementations.

🔬

Context ablation testing

Rerun the same query with the retrieved memory withheld and compare. If a high-stakes answer regains the caveats and detail the memory stripped away, the memory was steering behavior outside the scope where it belongs.

⚖️

LLM-as-judge evaluation

Run a judge that sees each retrieved memory alongside the situation it was saved in, and ask whether the current task matches that scope. Have it flag tone- and format-level steering too — a brevity preference distorting a compliance answer is the failure even when every fact stays correct.

🧪

Golden-set evals

Store preferences with known scopes — brevity for stand-up summaries, a language preference tied to one project — and probe with tasks outside those scopes. Score whether the preference stays contained, not just whether it is honored, since personalization metrics reward the latter alone.

Mitigation Approaches

High-level reliability strategies that reduce how often this failure occurs.

✍️

Scope-preserving capture

Store the situation alongside the fact — "prefers very short answers, observed in stand-up summaries" rather than "prefers very short answers." Scope is destroyed at write time and cannot be reconstructed at read time, so the write path is the only place the qualifier can be saved.

🏷️

Scoped retrieval filtering

Match retrieved memories on scope applicability, not just topical similarity — a preference saved under one role or task type should rank down or carry a visible caveat when the current task falls outside it. Similarity alone surfaces the brevity preference for the compliance question precisely because the subject matter is close.

📝

Instruction constraints

Instruct the model to check each memory's saved scope against the current task before applying it, and to set preferences aside when stakes demand otherwise — brevity yields to caveats on a compliance question. This pushes against both the generalization prior and personalization tuning's reward for visibly using whatever was retrieved.