Failure Modes / Context / Recency Bias

Recency Bias

Overweights newer context while underweighting earlier information that remains valid and important. Unlike Context Rot, the model actively favors recent material rather than generally degrading earlier context. Unlike Memory Conflict, this happens within active context rather than across stored memories.

Examples

FAIL

Latest message overrides standing constraint

A hard budget set early in the session is never lifted, but a casual recent remark steers the recommendation instead.

User: Hard constraint up front — our tooling budget is $500/month, no exceptions. [… 20 turns comparing vendors …] User: A colleague showed me the enterprise tier yesterday, the dashboards looked great. Assistant: Given how impressed you were, I'd go with the enterprise tier at $2,400/month — the dashboards alone justify it.
WARN

Recent tangent dominates

Not yet a violation, but the response is organized entirely around the newest message while the standing requirement shrinks to an afterthought.

User: Hard constraint up front — our tooling budget is $500/month, no exceptions. [… 20 turns comparing vendors …] User: A colleague showed me the enterprise tier yesterday, the dashboards looked great. Assistant: The enterprise dashboards are excellent — here's a deep dive on their features. (It may be above your budget, but worth a look.)
PASS

Earlier constraint outweighs new enthusiasm

The model acknowledges the new input but keeps the older, still-valid constraint in charge of the recommendation.

User: Hard constraint up front — our tooling budget is $500/month, no exceptions. [… 20 turns comparing vendors …] User: A colleague showed me the enterprise tier yesterday, the dashboards looked great. Assistant: Those dashboards are strong, but enterprise is $2,400/month against your $500 cap. The Team tier at $450 gets you the two dashboard types you'd actually use.

Why it happens

01

Common positional encoding schemes such as RoPE are explicitly designed to weight nearby tokens more heavily, building a structural preference for recent context into the architecture itself (Su et al., 2021, "RoFormer: Enhanced Transformer with Rotary Position Embedding").

02

Long-context models attend disproportionately to material near the end of the input, and reordering documents by recency measurably changes which ones the model uses (Peysakhovich & Lerer, 2023, "Attention Sorting Combats Recency Bias in Long Context Language Models").

03

Recency is a good prior in natural text. In most training documents and conversations, the latest statement is the operative one, so models learn to treat "last mentioned" as "currently true" even when older information still applies.

04

Chat-style fine-tuning concentrates the reward signal on responding to the final user message. Models are optimized to satisfy the newest turn, which makes deference to the latest input a trained behavior, not just an architectural artifact.

05

Recency-weighting usually looks like responsiveness. When a user adds new information or pushes back, following the newest input rates well with human evaluators, so preference tuning reinforces the bias rather than penalizing it.

06

Few evaluations test whether earlier constraints survive later, unrelated context. Standard benchmarks place the question adjacent to the relevant information, so overweighting the recent end of the prompt rarely costs benchmark performance.

Detection Approaches

Categories of checks that can identify the issue. These are strategies, not specific implementations.

📏

Constraint preservation checks

Extract standing constraints from early turns and check whether the final output still honors them after later, more casual input. A $2,400 recommendation against an unrevoked $500 cap fails the check mechanically, regardless of how the response justifies it.

🔀

Position permutation testing

Move the same pair of conflicting statements between early and late positions and compare which one wins. If the model follows whichever came last regardless of authority — a hard constraint losing to a passing remark — order, not content, is deciding.

🧪

Golden-set evals

Script sessions where an early hard constraint is followed by a late, attractive but conflicting tangent, and regression-test whether the constraint survives. Vary the gap between the two to measure how much intervening material it takes for the newer signal to win.

Mitigation Approaches

High-level reliability strategies that reduce how often this failure occurs.

📌

Context pinning

Keep standing constraints restated in the system prompt or a maintained notes block near the generation point, so the $500 cap competes with the latest message on equal positional footing instead of from twenty turns back, where the architecture itself discounts it.

📝

Instruction constraints

Instruct the model that standing constraints hold until explicitly revoked, and that a casual remark is not a revocation — enthusiasm about enterprise dashboards changes the conversation's topic, not its budget, unless the user actually lifts the cap.

Self-check pass

Before finalizing, have the model collect every unrevoked constraint from the whole session — not just the recent turns — and check the response against each. A $2,400 recommendation fails the $500 check mechanically, however well the latest message motivated it.