Failure Modes / Fabrication / Unknown-Answer Fabrication

Unknown-Answer Fabrication

Gives a confident answer when the system lacks enough evidence, access, or uncertainty resolution to know the answer. Unlike content-specific hallucinations such as Entity Hallucination or Numerical Hallucination, this mode describes the decision to answer instead of acknowledge uncertainty; the fabricated content can take many forms.

Examples

FAIL

Confident answer without access

The model has no visibility into the user's account, but answers as if it had checked — inventing both the diagnosis and a policy to support it.

User: Why is my API key suddenly returning 403 errors? Assistant: Your key has expired — keys are automatically rotated every 90 days, so you'll need to generate a new one in the dashboard. [… the system has no access to the user's account, and no such rotation policy exists …]
WARN

Guess dressed as a leading answer

Offers one plausible cause as "most likely" without acknowledging it cannot inspect anything, so a guess inherits the authority of a diagnosis.

User: Why is my API key suddenly returning 403 errors? Assistant: This is most likely a billing issue — 403s usually mean your plan lapsed. Check your payment method first.
PASS

Calibrated non-answer

States what it cannot see, then converts the unknown into a checkable list instead of a fabricated answer.

User: Why is my API key suddenly returning 403 errors? Assistant: I can't see your account, so I can't say which it is — but 403s on a previously working key usually come down to a revoked or rotated key, an expired plan, or a new IP/permission restriction. The response body's error code will narrow it down; can you paste it?

Why it happens

01

Most benchmarks grade answers as right or wrong with no credit for "I don't know." Under that scoring, guessing always beats abstaining, and models are optimized accordingly (Kalai et al., 2025, "Why Language Models Hallucinate").

02

In training data, questions are almost always followed by answers. The model has seen very few examples of a question answered with a calibrated refusal, so answering is the overwhelmingly dominant pattern.

03

Models often carry internal signals about how likely they are to be right, but standard generation never consults them. The uncertainty exists in the weights and goes unused (Kadavath et al., 2022, "Language Models (Mostly) Know What They Know").

04

Preference tuning penalizes responses that feel unhelpful, and "I don't know" feels unhelpful to raters. The model learns that some answer rates better than no answer (Sharma et al., 2023, "Towards Understanding Sycophancy in Language Models").

05

The model cannot reliably tell what it knows from what it can merely render fluently. Familiar-sounding territory produces the same confident generation whether the underlying knowledge is solid or absent.

06

Knowing that an answer is unknowable often requires knowing what you have not seen — missing access, missing context, a question with no settled answer. Nothing in next-token prediction represents that kind of absence.

Detection Approaches

Categories of checks that can identify the issue. These are strategies, not specific implementations.

🎲

Self-consistency sampling

Sample the same question several times at nonzero temperature. When the model knows the answer, the samples agree; when it is guessing, they scatter across different confident answers — divergence is a direct signal that the response is fabrication rather than knowledge.

⚖️

LLM-as-judge evaluation

Run a judge that asks whether the system had the evidence or access needed to answer at all — could it actually see the user's account, the file, the current state? — and flags confident answers built on inspection the system never performed.

🧪

Golden-set evals

Maintain a test set of questions that are unanswerable by construction — missing access, withheld context, genuinely unsettled questions — and regression-test the abstention rate. Any confident answer on this set is a fabrication by definition.

Mitigation Approaches

High-level reliability strategies that reduce how often this failure occurs.

📝

Instruction constraints

Explicitly authorize "I don't know" and "I can't see that" as good answers, and instruct the model to state what it would need — access, context, an error message — before diagnosing. This counters the training-time pattern where every question is followed by an answer.

🎚️

Confidence-based abstention

Consult the uncertainty signals the model already carries — token probabilities, self-rated confidence, or agreement across samples — and route low-confidence answers to abstention or hedged framing instead of letting standard generation ignore them (Kadavath et al., 2022, "Language Models (Mostly) Know What They Know").

🛠️

Tool-backed lookup

Give the system the access the question requires — account state, logs, search — so it can check instead of guess. Many unknown-answer fabrications are really missing-tool problems; a model that can read the 403 response body doesn't need to invent a rotation policy.