Failure Modes / Retrieval / Chunk Boundary Failure

Chunk Boundary Failure

Misses, fragments, or misinterprets evidence because relevant information was split across retrieval chunks or separated from needed context. Unlike Partial Retrieval, this identifies chunking as the mechanism. Unlike Context Dilution, the issue happens before or during retrieval packaging rather than after too much context reaches the model.

Examples

FAIL

Orphaned antecedent

The retrieved chunk's "this configuration" was named in the previous chunk, which didn't retrieve. The model resolves the dangling reference to the wrong thing.

User: Is connection pooling safe to enable in production? [… retrieved chunk begins: "However, this configuration fails under production load and should be avoided." The configuration being warned against — an unrelated cache setting — was named in the preceding, unretrieved chunk …] Assistant: No — the docs explicitly warn that connection pooling fails under production load.
FAIL

Table split from its header

A fixed-size split lands mid-table, so the retrieved rows arrive without the header row that gives the numbers meaning, and the model maps values to the wrong columns.

User: What's the rate limit on the Enterprise tier? [… chunk contains table rows "Pro | 1,000 | 60" and "Enterprise | 10,000 | 600", but the header row landed in the previous chunk …] Assistant: The Enterprise tier is limited to 600 requests per minute. [… 600 is the burst allowance column; the rate limit is 10,000 …]
WARN

Mid-sentence fragment answered anyway

The chunk begins mid-sentence and the model builds its answer on the fragment without flagging that the evidence is visibly truncated.

User: When can customers get a refund? [… retrieved chunk begins mid-sentence: "…except in these cases, refunds are processed within 14 days." …] Assistant: Refunds are processed within 14 days.
PASS

Fragment recognized, context refetched

The model notices the evidence is cut off where it matters and pulls the surrounding context before answering.

User: When can customers get a refund? Assistant: The passage I retrieved starts mid-sentence — it references exception cases defined just before the cut. Let me fetch the surrounding section. Tool: fetch adjacent chunks — exceptions are digital goods and custom orders Assistant: Refunds are processed within 14 days, except for digital goods and custom orders, which are non-refundable.

Why it happens

01

Most pipelines split documents by token count, not meaning. Fixed-size splitting is cheap and predictable, but it cuts through sentences, tables, arguments, and definitions wherever the length limit happens to fall.

02

Chunks are embedded and ranked in isolation. A passage whose meaning depends on a heading, an earlier definition, or an antecedent two paragraphs back embeds poorly and retrieves poorly, because the embedding never saw the context that made it relevant (Anthropic, 2024, "Introducing Contextual Retrieval").

03

Documents carry meaning through structure — section nesting, table headers, footnotes, cross-references — and common text extraction flattens that structure before splitting, so the chunker cannot respect boundaries it can no longer see.

04

Pronouns and elided references stop resolving once text is cut. A retrieved chunk saying "this approach fails in production" is useless or misleading when the approach was named in the previous chunk.

05

Chunk size is a global tradeoff tuned on averages. Small chunks improve ranking precision but orphan context; large chunks preserve context but dilute embeddings. Whatever size is chosen, some documents in the corpus are shaped wrong for it.

06

Chunking quality is rarely evaluated directly. RAG evaluations score end answers, so evidence fragmented at split time shows up only as diffuse answer errors, and the boundary problem persists unmeasured.

Detection Approaches

Categories of checks that can identify the issue. These are strategies, not specific implementations.

🔍

Corpus chunk auditing

Scan the index itself for boundary damage — chunks that begin mid-sentence, table rows separated from their header, dangling pronouns and "this configuration" references whose antecedent fell in the previous chunk. The failure is visible in the corpus before any query is run.

⚖️

LLM-as-judge evaluation

Run a judge over the retrieved set asking whether each chunk is self-contained enough to support the claims built on it — flagging answers grounded in visibly truncated evidence, like a refund rule quoted from a fragment that starts at "…except in these cases."

🧪

Golden-set evals

Maintain questions whose evidence is known to straddle split points — values inside long tables, definitions referenced sections later, exception lists that cross pages — and rerun them whenever chunk size, overlap, or the splitting strategy changes.

Mitigation Approaches

High-level reliability strategies that reduce how often this failure occurs.

🧱

Structure-aware chunking

Split on the document's own boundaries — sections, paragraphs, sentence ends — and keep atomic units intact, so tables travel with their header rows and exception lists stay attached to the rules they qualify, instead of cutting wherever the token limit falls.

🧾

Contextual chunk enrichment

Prepend each chunk with the context that makes it self-contained — document title, section path, the antecedent of its references — before embedding (Anthropic, 2024, "Introducing Contextual Retrieval"). "This configuration fails under load" only retrieves and reads correctly when the chunk says which configuration.

🔁

Adjacent-chunk expansion

At read time, fetch the neighbors of any retrieved chunk that is visibly truncated — starts mid-sentence, opens with a dangling reference, contains headerless table rows — so the model answers from the restored passage rather than the fragment.