Failure Modes / Context / Summarization Loss

Summarization Loss

Drops important facts, constraints, uncertainty, or nuance when compressing earlier context into a summary. Unlike Summarization Distortion, the retained meaning is not necessarily changed; too much is simply missing. Unlike Context Rot, the degradation is introduced by the summarization step.

Examples

FAIL

Constraint dropped at compaction

A hard limit stated early in the session doesn't survive context compaction, and the model's later work violates it without the model ever contradicting anything it can still see.

User: Keep the export under 10 MB — the upload endpoint rejects anything bigger. [… long session; context compacted to "user is building a CSV export feature" …] Assistant: I've embedded the full-resolution images in the export so recipients don't need separate downloads.
WARN

Qualifiers stripped

The facts survive compression but their conditions do not — a summary turns "only if the client signs by March" into a flat commitment.

User: Legal said we can use the new logo, but only if the client signs the licensing addendum by March. [… session summarized: "Legal approved use of the new logo" …] Assistant: Since legal has approved the new logo, I've added it to all the launch materials.
PASS

Constraint survives compaction

The summarization step preserves the operative limit, and post-compaction work still honors it.

User: Keep the export under 10 MB — the upload endpoint rejects anything bigger. [… long session; compacted summary retains "hard limit: export ≤ 10 MB" …] Assistant: Embedding images would blow past the 10 MB cap, so I've linked thumbnails instead and the export sits at 4 MB.

Why it happens

01

Summarization is lossy by definition, but the model decides what to drop using general salience learned from training data, not the task's actual stakes. A constraint that is critical to one user reads as a minor detail in most training examples.

02

Summarization training and evaluation have historically prioritized fluency and topical coverage over completeness. Human and automatic ratings correlate weakly with whether specific facts survive compression (Fabbri et al., 2021, "SummEval: Re-evaluating Summarization Evaluation").

03

Hedges, uncertainty, and qualifiers are statistically incidental in training summaries, so compression tends to strip "probably" and "only if" clauses first, silently converting conditional statements into confident ones.

04

Fixed summary budgets force uniform compression. When a system compresses earlier context into a fixed-size buffer regardless of information density, dense passages lose proportionally more content.

05

In agent and chat systems, summarization runs before the model knows which details the future task will need. Relevance must be guessed in advance, and wrong guesses surface only later as missing facts.

06

Compaction loss compounds across rounds. Long-running systems summarize summaries, and each pass re-applies the same salience filter, so details that barely survived one compression are dropped by the next.

Detection Approaches

Categories of checks that can identify the issue. These are strategies, not specific implementations.

📏

Constraint preservation checks

Extract constraints, qualifiers, and hard limits from the pre-compression context and check that each survives into the summary. Run it at the compaction step itself — a dropped "only if" clause or size cap is detectable the moment it is lost, before any downstream work violates it.

QA-based consistency checks

Generate questions from the original context and try to answer them from the summary alone. Questions that become unanswerable map directly to dropped content, and weighting the question set toward constraints and conditions targets the losses that actually cause downstream failures.

🧪

Golden-set evals

Maintain long sessions seeded with known critical details, run them through the real compaction pipeline, and probe post-compaction behavior. Test multiple rounds of summarization, since details that barely survive one pass are often dropped by the next.

Mitigation Approaches

High-level reliability strategies that reduce how often this failure occurs.

📝

Instruction constraints

Make constraints, hard limits, and "only if" qualifiers mandatory survivors of the compaction prompt — carried verbatim into a dedicated section of the summary regardless of how incidental they look — since the model's learned salience reads a user's critical limit as a minor detail.

📏

Constraint preservation checks

Run the survival check as a gate at the compaction step itself — extract constraints and qualifiers from the pre-compression context, verify each appears in the summary, and restore what's missing before the original drops out of the window and the loss becomes unrecoverable.

🧷

Source-anchored compression

Keep the original context retrievable after compaction and re-summarize from it rather than from prior summaries — each summary-of-a-summary re-applies the same salience filter, so a detail that barely survived one pass is dropped by the next.