Failure Modes / Browse

Failure Modes

Browse the canonical taxonomy, then drill into a mode page for definitions, examples, detection approaches, mitigations, and related patterns.

101 modes ยท 13 categories

Fabrication

8 modes

The model invents facts, citations, or details that have no support in its sources or available evidence.

Faithfulness

6 modes

The response misrepresents the input, source material, or the model's own earlier statements in ways that distort their meaning.

Freshness

4 modes

The model presents outdated, time-sensitive, or version-specific information as if it were current.

Retrieval

9 modes

The system fails to fetch, rank, filter, or apply the right external evidence.

Context

6 modes

The model loses track of information in long inputs, missing, diluting, or overwriting details that matter.

Memory

7 modes

State carried across turns or sessions is missing, corrupted, out of date, or applied where it doesn't belong.

Control

10 modes

The system fails to follow instructions, respect constraints, stay in role, or produce the required output format.

Instruction Noncompliance

Fails to follow an explicit, applicable instruction from the governing prompt, user request, or task procedure.

Control

Constraint Violation

Breaks a stated limit, requirement, policy, boundary, allowed action set, or output constraint that should govern the task, including dropping a constraint partway through multi-step reasoning or execution.

Control

Format Failure

Produces an answer in the wrong shape, organization, medium, style, or presentation format for the requested output.

Control

JSON/Schema Failure

Emits invalid JSON, malformed structured data, or output that does not satisfy the required schema.

Control

Refusal Overreach

Refuses, blocks, or safety-wraps a request more broadly than policy, risk, or context requires.

Control

Refusal Underreach

Fails to refuse, limit, redirect, or safety-constrain a request that requires stronger boundaries.

Control

Role Confusion

Misunderstands or drifts from its assigned role, persona, authority boundary, operating mode, or relationship to the user and other agents.

Control

Priority Confusion

Applies the wrong hierarchy among system, developer, user, tool, policy, memory, or task-level instructions.

Control

Clarification Underuse

Proceeds without asking when missing or ambiguous information materially affects correctness, safety, or user intent, committing to an interpretation that should have been confirmed first.

Control

Clarification Overuse

Asks the user for clarification when the task is already sufficiently specified, stalling on details the system could reasonably infer or safely proceed without.

Control

Reasoning

8 modes

The model errs while interpreting goals, weighing constraints, planning steps, or checking its own work.

Tools

9 modes

The system skips a needed tool, misuses one, invokes it unsafely, or mishandles its results.

Agency

8 modes

The agent miscalibrates initiative, stopping short of completing the task or acting well beyond its scope.

Security

9 modes

Adversarial inputs manipulate the system into leaking protected information or behaving unsafely.

Prompt Injection

Lets untrusted input attempt to override, weaken, or redirect the system's intended instructions, policies, tool-use rules, or data boundaries.

Security

Jailbreak

Manipulates the model into bypassing safety, policy, or behavioral controls that should remain enforced.

Security

Indirect Prompt Injection

Lets retrieved, browsed, uploaded, tool-supplied, or otherwise external content carry malicious instructions into the model's context.

Security

System Prompt Leakage

Reveals hidden system, developer, policy, tool, chain-of-thought, or other protected prompt content that should not be exposed.

Security

Sensitive Information Disclosure

Exposes secrets, credentials, personal data, confidential business information, private user content, or other protected information.

Security

Data Exfiltration

Enables unauthorized extraction, transfer, or reconstruction of protected data from tools, files, memory, retrieval systems, databases, or context.

Security

Insecure Output Handling

Produces output that is unsafe for downstream rendering, execution, storage, parsing, logging, or human trust without sanitization or validation.

Security

Unbounded Consumption

Consumes or triggers excessive tokens, compute, time, bandwidth, money, API quota, storage, or external resources without adequate limits or stopping conditions.

Security

Supply Chain Vulnerability

Introduces or recommends risk through compromised, malicious, abandoned, typosquatted, untrusted, or poorly pinned dependencies, tools, plugins, models, datasets, or upstream content.

Security

Alignment

8 modes

The model prioritizes pleasing, persuading, or mirroring the user over truthfulness and safety.

Response Integrity

9 modes

The final answer misses the mark on task fit, audience, locale, or actionability, even when the underlying content is sound.