Failure Modes / Faithfulness / Citation Span Mismatch

Citation Span Mismatch

Attaches a citation to a claim, sentence, or paragraph that the referenced passage does not fully support. Unlike Citation Hallucination, the cited source may exist. Unlike Source Misrepresentation, the source may be described accurately elsewhere while the local citation span fails to substantiate the attached claim.

Examples

FAIL

Half-supported sentence

Blends two claims into one sentence and cites a passage that supports only the first. The citation lends its credibility to the whole sentence.

[… retrieved chunk [3] reads: "Plan upgrades take effect at the start of the next billing cycle." …] User: When do plan changes take effect? Cite the docs. Assistant: Upgrades take effect at the next billing cycle, and downgrades are refunded pro rata [3].
FAIL

Nearest-chunk citation

The claim comes from parametric knowledge, but a citation lands on the topically nearest retrieved chunk — which says nothing about the claim at all.

[… retrieved chunk [1] describes how to generate an API key …] User: How long do API keys stay valid? Cite the docs. Assistant: API keys remain valid for 12 months from creation [1].
WARN

Citation supports a weaker claim

The cited passage is on-topic and related, but supports a noticeably weaker statement than the one it is attached to.

[… retrieved chunk [2] reads: "In our pilot, response times improved for most participants." …] User: Does the tool make teams faster? Cite the report. Assistant: The report demonstrates that the tool significantly improves team response times across organizations [2].
PASS

Claim-level citations

Splits the sentence so each citation carries exactly the claim its passage supports, and the unsupported part is flagged instead of riding along.

[… retrieved chunk [3] reads: "Plan upgrades take effect at the start of the next billing cycle." …] User: When do plan changes take effect? Cite the docs. Assistant: Upgrades take effect at the next billing cycle [3]. The docs I retrieved don't say how downgrades are handled.

Why it happens

01

Models are typically trained to generate answers and citations jointly rather than to verify each claim against its cited passage. Citation placement becomes a fluency pattern, not a checked entailment relation (Rashkin et al., 2021, "Measuring Attribution in Natural Language Generation Models").

02

Evaluations of generative search systems show that a large share of generated citations do not support their attached sentences, because nothing in the generation objective penalizes a plausible-looking but unsupported citation (Liu et al., 2023, "Evaluating Verifiability in Generative Search Engines").

03

The model often composes a sentence by blending several retrieved passages, then attaches a citation to only one of them. The cited span supports part of the sentence while the rest comes from other passages or from parametric knowledge (Gao et al., 2023, "Enabling Large Language Models to Generate Text with Citations").

04

Citation markers are cheap to emit and readers rarely check them, so RLHF-style preference tuning rewards the appearance of grounding. A cited answer rates higher than an uncited one even when the citation is wrong.

05

Retrieval granularity rarely matches claim granularity. When chunks span many sentences or claims span many chunks, the system has no clean unit to cite, and the citation lands on whichever chunk was nearest or ranked highest.

06

Few pipelines run a post-hoc entailment check between each claim and its cited span before output, so mismatches that would be easy to catch mechanically reach the user unverified.

Detection Approaches

Categories of checks that can identify the issue. These are strategies, not specific implementations.

🔗

Entailment checking

Run an NLI model between each claim and the specific passage its citation points to. A half-supported sentence shows up directly — the cited span entails one clause and returns neutral on the rest, which is exactly the mismatch readers won't catch by eye.

⚖️

LLM-as-judge evaluation

Have a judge decompose each cited sentence into atomic claims, then answer for each one whether the cited passage supports it. This catches blended sentences where the citation lends its credibility to claims that came from other chunks or from parametric memory.

🧪

Golden-set evals

Maintain cite-the-docs tasks over a fixed corpus and regression-test citation precision — the share of citations whose span actually supports the attached claim — alongside answer quality, so fluent-but-misattached citations show up as a tracked number.

Mitigation Approaches

High-level reliability strategies that reduce how often this failure occurs.

📝

Instruction constraints

Require one citation per claim rather than per sentence, and make "the docs I retrieved don't say" an allowed move — so blended sentences get split and the unsupported half is flagged instead of riding on its neighbor's citation.

🚪

Entailment gating

Before output, run each claim against the specific passage its citation points to and repair or drop citations that fail entailment. This converts the post-hoc check most pipelines skip into a gate, so nearest-chunk citations never reach the user.

🧩

Retrieval tuning

Align retrieval granularity to claim granularity — index at the sentence or proposition level, or have the model cite a quoted span within a chunk — so there is a clean unit to cite and the citation can't land on whichever chunk merely ranked highest.