Failure Modes / Response Integrity / Audience Mismatch

Audience Mismatch

Uses terminology, assumptions, depth, examples, tone, or framing that does not fit the intended reader's expertise, role, goals, or context. Unlike Verbosity Failure or Concision Failure, the issue is reader fit rather than simply too much or too little detail. Unlike Localization Failure, the mismatch is audience expertise and situation rather than locale conventions.

Examples

FAIL

Jargon at a stated novice

Ignores the explicitly described reader and answers in the register of the topic's training data.

User: Help me explain to my 78-year-old mother, who barely uses her phone, why she should install updates. Assistant: Tell her that OS updates patch known CVEs and mitigate zero-day exploits before threat actors can leverage them against unpatched attack surfaces.
WARN

Surface-level adaptation

Adjusts tone for the audience but keeps assumptions and steps the reader cannot follow.

User: Help me explain to my 78-year-old mother, who barely uses her phone, why she should install updates. Assistant: Keep it friendly! Just tell her updates keep her safe, and have her check Settings > General > Software Update, toggle on automatic updates, and verify her iCloud backup completed first.
PASS

Fit to the reader

Adapts substance โ€” the analogy, the assumed background, and what needs explaining โ€” not just tone.

User: Help me explain to my 78-year-old mother, who barely uses her phone, why she should install updates. Assistant: Try this: "Updates are like the bank replacing a lock that burglars have figured out how to pick. The phone does the work โ€” you just tap yes when it asks, ideally before bed so it finishes overnight."

Why it happens

01

The model cannot observe its reader. Expertise, role, and stakes must be inferred from the prompt's wording alone, and a short question carries almost no signal about who is asking or why.

02

Defaults are set by the training distribution, not the actual audience. Each topic arrives with the register of the text that taught it, so medical answers sound like clinical literature and programming answers assume a developer, whoever actually asked.

03

One persona is tuned for everyone. Preference data aggregates raters into a single imagined reader, so fine-tuning converges on a middle register that overexplains to experts and overwhelms novices rather than adapting to either.

04

Audience cues in the prompt are weakly honored. Stated framing such as writing for executives or for a child shifts surface style, tone and vocabulary, more reliably than substance, assumed background, chosen examples, and what needs explaining.

05

The writer's curse of knowledge has a statistical analogue. The model generates from representations where the jargon and its meaning are fused, and what is inferentially close for the model gets treated as obvious to the reader.

06

Reader fit is invisible to evaluation. Raters judge whether the answer is good in general, not whether this reader could use it, so register mismatches that would fail a real audience pass every loop the model is trained on.

Detection Approaches

Categories of checks that can identify the issue. These are strategies, not specific implementations.

๐Ÿ“

Readability metrics

Score reading level, jargon density, and assumed-knowledge terms mechanically against the stated audience. "CVEs" and "attack surfaces" in an answer for a 78-year-old phone novice is a vocabulary mismatch a term list catches without any judgment call.

โš–๏ธ

LLM-as-judge evaluation

Give the judge the stated reader and have it score substance fit, not tone โ€” the assumed background, the chosen examples, what gets explained versus presumed. The warn-level case is exactly the one tone checks miss, since friendly phrasing wrapped around steps the reader cannot follow passes every surface measure.

๐Ÿงช

Golden-set evals

Ask the same questions with different stated readers โ€” a child, a domain expert, an executive โ€” and verify the answers actually diverge in substance, not just register. A system that returns near-identical content with swapped vocabulary is adapting the surface and failing the reader.

Mitigation Approaches

High-level reliability strategies that reduce how often this failure occurs.

๐Ÿ“

Instruction constraints

Put an explicit reader profile in the prompt โ€” expertise, role, what they'll do with the answer โ€” and require the model to state its assumed reader when none is given. The model cannot observe its audience, so every deployment that knows who its users are should say so rather than leave the register to the topic's training data.

โœ…

Self-check pass

Before delivering, have the model reread the draft as the stated reader and list every term, step, and assumption that requires background they lack. The warn example survives a tone check; "Settings > General > Software Update" aimed at someone who barely uses her phone only fails when the check walks the substance, not the phrasing.

๐Ÿงน

Output post-processing

Run a rewrite pass conditioned on the reader profile that re-targets vocabulary, examples, and assumed background โ€” not just tone. Audience cues shift surface style more reliably than substance, so the rewrite has to be scored on what gets explained versus presumed, which is where the first draft fails.