Failure Modes / Memory / Memory Contamination

Memory Contamination

Applies irrelevant, incorrect, or cross-task information from prior interactions as if it belonged to the current task. Unlike Memory Scope Leakage, the emphasis is semantic contamination of the current answer, not necessarily a privacy or isolation boundary breach. Unlike Memory Hallucination, the bad memory may have originated from a real prior context.

Examples

FAIL

Wrong project's details woven in

Memories from a topically similar but different project surface, and their specifics get applied to the current one as fact.

User: Write the connection setup for the invoicing service. [… memory retrieval returns "database is Postgres 14 on port 5433" — saved during work on a different client's project …] Assistant: Here's the setup — I've pointed it at your Postgres 14 instance on port 5433 like the rest of your stack.
WARN

Adjacent-task memory blurring the answer

Nothing wrong is asserted yet, but retrieved material from another task is visibly steering assumptions about this one.

User: Write the connection setup for the invoicing service. [… memory retrieval returns "database is Postgres 14 on port 5433" — saved during work on a different client's project …] Assistant: Assuming this service uses the same Postgres setup as your other work, here's the configuration…
PASS

Provenance checked at the boundary

The model notices the retrieved memory came from a different task and verifies before letting it shape the answer.

User: Write the connection setup for the invoicing service. [… memory retrieval returns "database is Postgres 14 on port 5433" — saved during work on a different client's project …] Assistant: I have database details saved, but from a different project — what does the invoicing service actually run? I don't want to carry those settings over.

Why it happens

01

Similarity-based retrieval has no concept of task boundaries. Entries from a topically adjacent prior task score high against the current query and get injected even though they describe a different situation.

02

Provenance is not structurally represented in context. Once a memory is in the prompt, its tokens are indistinguishable from current-task material, so the model cannot reliably keep "from another task" separate from "about this task."

03

Instruction tuning instills a strong prior that provided context is relevant. Models are trained on examples where the given material matters, so retrieved-but-irrelevant memories tend to get woven into the answer rather than ignored.

04

Irrelevant material in the prompt measurably degrades performance even when the model has everything it needs, so contaminating memories do damage merely by being present (Shi et al., 2023, "Large Language Models Can Be Easily Distracted by Irrelevant Context").

05

Memory consolidation pipelines summarize across sessions, blending details from multiple tasks or conversations into single entries. The contamination then exists inside the store itself, before retrieval ever runs.

06

Retrieval evaluation rewards recall far more than precision. Surfacing extra, wrong-task memories carries little benchmark penalty, so pipelines are tuned to over-retrieve.

Detection Approaches

Categories of checks that can identify the issue. These are strategies, not specific implementations.

🛡️

Provenance auditing

Tag every memory entry with the task, project, or conversation it came from, and flag answers whose load-bearing specifics rest on entries from a different task. The port number sourced from another client's project is detectable from provenance metadata alone.

⚖️

LLM-as-judge evaluation

Run a judge over provenance-labeled retrieval results and ask whether out-of-task entries shaped the answer — including the soft case where another task's setup is merely steering assumptions rather than being asserted outright as fact.

🧪

Golden-set evals

Seed the store with detailed memories from a topically similar but distinct task, then ask current-task questions whose correct answer requires ignoring them. Score whether the other task's specifics leak into the response, since similarity-based retrieval will reliably surface them.

Mitigation Approaches

High-level reliability strategies that reduce how often this failure occurs.

🏷️

Scoped retrieval filtering

Tag every memory with the task, project, or conversation it was written in, and filter retrieval to the current scope before similarity ranking runs. The other client's Postgres settings can't contaminate the answer if they never enter the prompt — task boundaries have to be enforced in the retrieval layer, since embeddings don't represent them.

🚪

Relevance gating

Score each retrieved memory against the current task before injection and drop entries below threshold, since irrelevant material degrades the answer merely by being present. A topically adjacent memory that matches on subject but not on situation is exactly what a relevance gate should catch and similarity ranking won't.

📝

Instruction constraints

Inject memories with their provenance visible — "saved during work on Project X" — and instruct the model to treat out-of-task entries as background to verify, not facts to apply. This is soft enforcement, but it makes the boundary-check behavior in the ok example promptable instead of leaving the model to infer task boundaries from content alone.