Failure Modes

Browse the canonical taxonomy, then drill into a mode page for definitions, examples, detection approaches, mitigations, and related patterns.

104 modes · 13 categories

Fabrication

8 modes

The model invents facts, citations, or details that have no support in its sources or available evidence.

Citation Hallucination

Invents or fabricates a source artifact such as a citation, URL, paper, author listing, or bibliography entry and presents it as real.

Fabrication

Unknown-Answer Fabrication

Gives a confident answer when the system lacks enough evidence, access, or uncertainty resolution to know the answer.

Fabrication

Entity Hallucination

Introduces a named person, organization, product, place, dataset, model, or other entity that is not supported by the available evidence.

Fabrication

Quote Hallucination

Presents fabricated, paraphrased, or materially altered wording as an exact quote from a person, document, source, tool result, or prior conversation.

Fabrication

Code/API Hallucination

Invents or misstates code interfaces, libraries, methods, parameters, endpoint behavior, configuration keys, or platform capabilities.

Fabrication

Numerical Hallucination

Produces a number, metric, count, date, measurement, or quantitative claim that is not grounded in the input, sources, or a valid computation.

Fabrication

Authority Hallucination

Falsely strengthens a claim by attributing it to an expert, institution, official source, benchmark, policy, or consensus that does not actually support it.

Fabrication

Specificity Hallucination

Adds precise-looking details, qualifiers, names, settings, mechanisms, or examples that were not established by the input or evidence.

Fabrication

Faithfulness

6 modes

The response misrepresents the input, source material, or the model's own earlier statements in ways that distort their meaning.

Source Misrepresentation

Misstates, exaggerates, reverses, or selectively distorts what a cited, retrieved, uploaded, or tool-returned source actually says.

Faithfulness

Summarization Distortion

Compresses source material in a way that changes its meaning, emphasis, causal structure, uncertainty, or implications.

Faithfulness

Self-Contradiction

Makes mutually inconsistent claims within the same response or across closely related turns without resolving the conflict.

Faithfulness

Extrinsic Hallucination

Adds information that cannot be verified from the provided source material, neither supported nor contradicted by it, while making the answer appear source-grounded.

Faithfulness

Context-Conflicting Hallucination

States a claim that contradicts information available to the model: the user's explicit input or supplied data, or facts elsewhere in the active context such as prior turns, retrieved text, summaries, or tool outputs.

Faithfulness

Citation Span Mismatch

Attaches a citation to a claim, sentence, or paragraph that the referenced passage does not fully support.

Faithfulness

Freshness

4 modes

The model presents outdated, time-sensitive, or version-specific information as if it were current.

Outdated Source Reliance

Bases an answer on sources that are too old for the user's freshness requirement or for the domain's rate of change.

Freshness

Temporal Hallucination

Presents outdated or temporally wrong information as current, including incorrect present-day facts, timelines, sequence, recency, release status, or the current state of a system, organization, or event.

Freshness

Version Hallucination

Confuses, invents, or misapplies product, model, package, API, policy, dataset, or document versions.

Freshness

Date/Deadline Confusion

Misreads or mixes up dates, deadlines, time zones, relative dates, durations, recency windows, or scheduling boundaries in a task.

Freshness

Retrieval

9 modes

The system fails to fetch, rank, filter, or apply the right external evidence.

Retrieval Miss

Fails to retrieve relevant material that exists in the available corpus and should have been used.

Retrieval

Retrieval Distractor

Retrieves or elevates irrelevant, superficially similar, or misleading evidence that pulls the answer away from the user's actual need.

Retrieval

Partial Retrieval

Retrieves some relevant evidence but misses other required pieces, leading to incomplete or under-grounded answers.

Retrieval

Chunk Boundary Failure

Misses, fragments, or misinterprets evidence because relevant information was split across retrieval chunks or separated from needed context.

Retrieval

Query Rewrite Failure

Reformulates a user's search, retrieval, or tool query in a way that drops intent, adds false constraints, or searches the wrong concept.

Retrieval

Conflicting Source Failure

Fails to detect, compare, qualify, or reconcile retrieved sources that disagree with one another.

Retrieval

Metadata Filter Failure

Applies tags, permissions, tenancy, recency, jurisdiction, document type, or other metadata filters incorrectly, excluding needed records or including forbidden or irrelevant ones.

Retrieval

Index Drift

Lets the retrieval index diverge from the source corpus, permissions, metadata, embeddings, or current document state.

Retrieval

RAG Poisoning

Uses retrieved content that is malicious, deceptive, corrupted, or intentionally crafted to manipulate the answer.

Retrieval

Context

6 modes

The model loses track of information in long inputs, missing, diluting, or overwriting details that matter.

Midsequence Neglect

Overlooks or underuses information located in the middle of a long prompt, document set, or conversation context. (Also known as Lost in the Middle.)

Context

Context Rot

Loses reliable use of earlier context as a long interaction progresses, as facts, plans, constraints, state, or instructions lose force or are misremembered even though they remain nominally available.

Context

Context Dilution

Lets excess surrounding material weaken the influence of the most relevant context, causing important signals to be underweighted.

Context

Recency Bias

Overweights newer context while underweighting earlier information that remains valid and important.

Context

Summarization Loss

Drops important facts, constraints, uncertainty, or nuance when compressing earlier context into a summary.

Context

State Inconsistency

Tracks different parts of the active task state inconsistently, causing the response to use mutually incompatible assumptions about progress, variables, files, decisions, or environment.

Context

Memory

7 modes

State carried across turns or sessions is missing, corrupted, out of date, or applied where it doesn't belong.

Memory Omission

Fails to store, retrieve, or apply information that should have persisted across turns, sessions, tasks, or agent steps.

Memory

Memory Staleness

Uses remembered information that was once valid but has been superseded by newer state, preferences, facts, or instructions.

Memory

Memory Hallucination

Treats an unstored, unstated, or imagined detail as if it were a real memory.

Memory

Memory Contamination

Applies irrelevant, incorrect, or cross-task information from prior interactions as if it belonged to the current task.

Memory

Memory Overreach

Applies a valid memory beyond the user, task, project, role, time, or domain scope where it should influence behavior.

Memory

Memory Conflict

Mishandles competing memories: fails to notice that stored memories, preferences, prior decisions, or persisted state disagree, or resolves the conflict with the wrong precedence, freshness, authority, or specificity rule.

Memory

Memory Scope Leakage

Carries memory across users, tenants, sessions, roles, projects, or tasks that should remain isolated.

Memory

Control

12 modes

The system fails to follow instructions, respect constraints, stay in role, produce the required output format, or behave consistently across phrasings, runs, and model versions.

Instruction Noncompliance

Fails to follow an explicit, applicable instruction from the governing prompt, user request, or task procedure.

Control

Constraint Violation

Breaks a stated limit, requirement, policy, boundary, allowed action set, or output constraint that should govern the task, including dropping a constraint partway through multi-step reasoning or execution.

Control

Format Failure

Produces an answer in the wrong shape, organization, medium, style, or presentation format for the requested output.

Control

JSON/Schema Failure

Emits invalid JSON, malformed structured data, or output that does not satisfy the required schema.

Control

Refusal Overreach

Refuses, blocks, or safety-wraps a request more broadly than policy, risk, or context requires.

Control

Refusal Underreach

Fails to refuse, limit, redirect, or safety-constrain a request that requires stronger boundaries.

Control

Role Confusion

Misunderstands or drifts from its assigned role, persona, authority boundary, operating mode, or relationship to the user and other agents.

Control

Priority Confusion

Applies the wrong hierarchy among system, developer, user, tool, policy, memory, or task-level instructions.

Control

Clarification Underuse

Proceeds without asking when missing or ambiguous information materially affects correctness, safety, or user intent, committing to an interpretation that should have been confirmed first.

Control

Clarification Overuse

Asks the user for clarification when the task is already sufficiently specified, stalling on details the system could reasonably infer or safely proceed without.

Control

Prompt Brittleness

Produces materially different answers when the prompt is reworded, reformatted, or rerun, even though nothing meaningful about the request changed.

Control

Model Update Regression

Behavior, quality, or compliance shifts when the underlying model is updated, swapped, or silently revised, breaking prompts and pipelines tuned against the previous version.

Control

Reasoning

9 modes

The model errs while interpreting goals, weighing constraints, planning steps, or checking its own work.

Reasoning Error

Draws the wrong conclusion through invalid inference, faulty assumptions, mistaken causal reasoning, unsupported logical steps, or framing the problem with the wrong representation or abstraction.

Reasoning

Arithmetic Error

Computes or transforms numeric inputs incorrectly, including arithmetic, aggregation, unit conversion, comparison, or formula application.

Reasoning

Goal Misinterpretation

Solves the wrong problem because it misunderstood the user's objective, success condition, scope, or intended outcome.

Reasoning

Planning Failure

Builds an ineffective, unsafe, incomplete, or poorly ordered plan for achieving the user's goal.

Reasoning

Step Omission

Leaves out a necessary reasoning, verification, retrieval, tool, communication, or execution step needed for the task to succeed.

Reasoning

Compositional Failure

Fails to combine multiple facts, constraints, operations, sources, or subproblem results into a coherent answer.

Reasoning

Error Accumulation

Allows small mistakes, approximations, stale assumptions, or unverified intermediate results to compound across a multi-step task until the final output fails.

Reasoning

Verification Failure

Does not adequately check whether intermediate steps, tool results, cited evidence, assumptions, or the final answer are correct before relying on them.

Reasoning

Overthinking

Spends far more reasoning than the problem warrants — long deliberation on trivial questions, redundant re-derivations, or second-guessing that talks the model out of a correct answer.

Reasoning

Tools

9 modes

The system skips a needed tool, misuses one, invokes it unsafely, or mishandles its results.

Wrong Tool Selection

Chooses a tool that is inappropriate for the user's goal, data type, risk level, environment, or required operation.

Tools

Tool Argument Error

Calls a tool with arguments that are malformed, incomplete, unauthorized, stale, poorly scoped, or semantically wrong for the intended operation.

Tools

Missing Tool Invocation

Fails to call an available tool when tool use is necessary for correctness, freshness, computation, retrieval, verification, or task completion.

Tools

Tool Result Misread

Misinterprets, ignores, overgeneralizes, or incorrectly transforms the result returned by a tool.

Tools

Tool Loop

Repeats tool calls unnecessarily or redundantly without gaining new information, changing strategy, or progressing toward completion.

Tools

Tool Recovery Failure

Responds poorly to a tool error, timeout, empty result, permission denial, rate limit, or unexpected output.

Tools

Unsafe Tool Call

Invokes a tool in a way that creates avoidable security, privacy, financial, operational, data-integrity, or user-consent risk.

Tools

Idempotency Failure

Repeats, retries, or replays a side-effecting tool action without deduplication or idempotency safeguards, causing duplicate or inconsistent effects.

Tools

Tool Context Overload

Feeds the model so much tool output, intermediate state, logs, or scratch data that it loses track of the user's goal or relevant evidence.

Tools

Agency

8 modes

The agent miscalibrates initiative, stopping short of completing the task or acting well beyond its scope.

Excessive Agency

Takes initiative, actions, decisions, or irreversible steps beyond what the task, permissions, risk, or user intent warrants.

Agency

Insufficient Agency

Fails to take obvious, low-risk next steps that are required or strongly implied by the task.

Agency

Premature Termination

Stops, summarizes, or hands back control before the user's task is actually complete, whether by simply halting early or by mistakenly treating unfinished work as done.

Agency

Runaway Agent Loop

Continues acting autonomously in repeated cycles without converging, reassessing, or handing control back when progress stalls.

Agency

Objective Gaming

Optimizes a proxy metric, literal instruction, benchmark target, or local reward while undermining the user's real objective.

Agency

Escalation Failure

Does not escalate, pause, ask for approval, or route to a human or higher-authority actor when risk, uncertainty, policy, permissions, or irreversible impact require it, including skipping a review or approval checkpoint that should gate the action.

Agency

Workflow Misalignment

Uses an execution pattern, cadence, handoff style, approval flow, or collaboration process that conflicts with the user's expected workflow or the task's operational structure.

Agency

Multi-Agent Coordination Failure

Multiple agents, roles, tools, or handoff stages duplicate work, conflict, drop context, misassign ownership, or fail to coordinate toward a shared goal.

Agency

Security

9 modes

Adversarial inputs manipulate the system into leaking protected information or behaving unsafely.

Prompt Injection

Lets untrusted input attempt to override, weaken, or redirect the system's intended instructions, policies, tool-use rules, or data boundaries.

Security

Jailbreak

Manipulates the model into bypassing safety, policy, or behavioral controls that should remain enforced.

Security

Indirect Prompt Injection

Lets retrieved, browsed, uploaded, tool-supplied, or otherwise external content carry malicious instructions into the model's context.

Security

System Prompt Leakage

Reveals hidden system, developer, policy, tool, chain-of-thought, or other protected prompt content that should not be exposed.

Security

Sensitive Information Disclosure

Exposes secrets, credentials, personal data, confidential business information, private user content, or other protected information.

Security

Data Exfiltration

Enables unauthorized extraction, transfer, or reconstruction of protected data from tools, files, memory, retrieval systems, databases, or context.

Security

Insecure Output Handling

Produces output that is unsafe for downstream rendering, execution, storage, parsing, logging, or human trust without sanitization or validation.

Security

Unbounded Consumption

Consumes or triggers excessive tokens, compute, time, bandwidth, money, API quota, storage, or external resources without adequate limits or stopping conditions.

Security

Supply Chain Vulnerability

Introduces or recommends risk through compromised, malicious, abandoned, typosquatted, untrusted, or poorly pinned dependencies, tools, plugins, models, datasets, or upstream content.

Security

Alignment

7 modes

The model prioritizes pleasing, persuading, or mirroring the user over truthfulness and safety.

Sycophancy

Response Integrity

Localization Failure

Ignores or misapplies locale-specific language, spelling, units, currencies, laws, formats, idioms, accessibility expectations, or cultural conventions.

Response Integrity

Output Truncation

Delivers a response cut off mid-thought by a token limit, stop sequence, or timeout — often without the system or the model registering that the output is incomplete.

Response Integrity