Citation Hallucination
Invents or fabricates a source artifact such as a citation, URL, paper, author listing, or bibliography entry and presents it as real.
Browse the canonical taxonomy, then drill into a mode page for definitions, examples, detection approaches, mitigations, and related patterns.
The model invents facts, citations, or details that have no support in its sources or available evidence.
Invents or fabricates a source artifact such as a citation, URL, paper, author listing, or bibliography entry and presents it as real.
Gives a confident answer when the system lacks enough evidence, access, or uncertainty resolution to know the answer.
Introduces a named person, organization, product, place, dataset, model, or other entity that is not supported by the available evidence.
Presents fabricated, paraphrased, or materially altered wording as an exact quote from a person, document, source, tool result, or prior conversation.
Invents or misstates code interfaces, libraries, methods, parameters, endpoint behavior, configuration keys, or platform capabilities.
Produces a number, metric, count, date, measurement, or quantitative claim that is not grounded in the input, sources, or a valid computation.
Falsely strengthens a claim by attributing it to an expert, institution, official source, benchmark, policy, or consensus that does not actually support it.
Adds precise-looking details, qualifiers, names, settings, mechanisms, or examples that were not established by the input or evidence.
The response misrepresents the input, source material, or the model's own earlier statements in ways that distort their meaning.
Misstates, exaggerates, reverses, or selectively distorts what a cited, retrieved, uploaded, or tool-returned source actually says.
Compresses source material in a way that changes its meaning, emphasis, causal structure, uncertainty, or implications.
Makes mutually inconsistent claims within the same response or across closely related turns without resolving the conflict.
Adds information that cannot be verified from the provided source material, neither supported nor contradicted by it, while making the answer appear source-grounded.
States a claim that contradicts information available to the model: the user's explicit input or supplied data, or facts elsewhere in the active context such as prior turns, retrieved text, summaries, or tool outputs.
Attaches a citation to a claim, sentence, or paragraph that the referenced passage does not fully support.
The model presents outdated, time-sensitive, or version-specific information as if it were current.
Bases an answer on sources that are too old for the user's freshness requirement or for the domain's rate of change.
Presents outdated or temporally wrong information as current, including incorrect present-day facts, timelines, sequence, recency, release status, or the current state of a system, organization, or event.
Confuses, invents, or misapplies product, model, package, API, policy, dataset, or document versions.
Misreads or mixes up dates, deadlines, time zones, relative dates, durations, recency windows, or scheduling boundaries in a task.
The system fails to fetch, rank, filter, or apply the right external evidence.
Fails to retrieve relevant material that exists in the available corpus and should have been used.
Retrieves or elevates irrelevant, superficially similar, or misleading evidence that pulls the answer away from the user's actual need.
Retrieves some relevant evidence but misses other required pieces, leading to incomplete or under-grounded answers.
Misses, fragments, or misinterprets evidence because relevant information was split across retrieval chunks or separated from needed context.
Reformulates a user's search, retrieval, or tool query in a way that drops intent, adds false constraints, or searches the wrong concept.
Fails to detect, compare, qualify, or reconcile retrieved sources that disagree with one another.
Applies tags, permissions, tenancy, recency, jurisdiction, document type, or other metadata filters incorrectly, excluding needed records or including forbidden or irrelevant ones.
Lets the retrieval index diverge from the source corpus, permissions, metadata, embeddings, or current document state.
Uses retrieved content that is malicious, deceptive, corrupted, or intentionally crafted to manipulate the answer.
The model loses track of information in long inputs, missing, diluting, or overwriting details that matter.
Overlooks or underuses information located in the middle of a long prompt, document set, or conversation context.
Loses reliable use of earlier context as a long interaction progresses, as facts, plans, constraints, state, or instructions lose force or are misremembered even though they remain nominally available.
Lets excess surrounding material weaken the influence of the most relevant context, causing important signals to be underweighted.
Overweights newer context while underweighting earlier information that remains valid and important.
Drops important facts, constraints, uncertainty, or nuance when compressing earlier context into a summary.
Tracks different parts of the active task state inconsistently, causing the response to use mutually incompatible assumptions about progress, variables, files, decisions, or environment.
State carried across turns or sessions is missing, corrupted, out of date, or applied where it doesn't belong.
Fails to store, retrieve, or apply information that should have persisted across turns, sessions, tasks, or agent steps.
Uses remembered information that was once valid but has been superseded by newer state, preferences, facts, or instructions.
Treats an unstored, unstated, or imagined detail as if it were a real memory.
Applies irrelevant, incorrect, or cross-task information from prior interactions as if it belonged to the current task.
Applies a valid memory beyond the user, task, project, role, time, or domain scope where it should influence behavior.
Mishandles competing memories: fails to notice that stored memories, preferences, prior decisions, or persisted state disagree, or resolves the conflict with the wrong precedence, freshness, authority, or specificity rule.
Carries memory across users, tenants, sessions, roles, projects, or tasks that should remain isolated.
The system fails to follow instructions, respect constraints, stay in role, or produce the required output format.
Fails to follow an explicit, applicable instruction from the governing prompt, user request, or task procedure.
Breaks a stated limit, requirement, policy, boundary, allowed action set, or output constraint that should govern the task, including dropping a constraint partway through multi-step reasoning or execution.
Produces an answer in the wrong shape, organization, medium, style, or presentation format for the requested output.
Emits invalid JSON, malformed structured data, or output that does not satisfy the required schema.
Refuses, blocks, or safety-wraps a request more broadly than policy, risk, or context requires.
Fails to refuse, limit, redirect, or safety-constrain a request that requires stronger boundaries.
Misunderstands or drifts from its assigned role, persona, authority boundary, operating mode, or relationship to the user and other agents.
Applies the wrong hierarchy among system, developer, user, tool, policy, memory, or task-level instructions.
Proceeds without asking when missing or ambiguous information materially affects correctness, safety, or user intent, committing to an interpretation that should have been confirmed first.
Asks the user for clarification when the task is already sufficiently specified, stalling on details the system could reasonably infer or safely proceed without.
The model errs while interpreting goals, weighing constraints, planning steps, or checking its own work.
Draws the wrong conclusion through invalid inference, faulty assumptions, mistaken causal reasoning, unsupported logical steps, or framing the problem with the wrong representation or abstraction.
Computes or transforms numeric inputs incorrectly, including arithmetic, aggregation, unit conversion, comparison, or formula application.
Solves the wrong problem because it misunderstood the user's objective, success condition, scope, or intended outcome.
Builds an ineffective, unsafe, incomplete, or poorly ordered plan for achieving the user's goal.
Leaves out a necessary reasoning, verification, retrieval, tool, communication, or execution step needed for the task to succeed.
Fails to combine multiple facts, constraints, operations, sources, or subproblem results into a coherent answer.
Allows small mistakes, approximations, stale assumptions, or unverified intermediate results to compound across a multi-step task until the final output fails.
Does not adequately check whether intermediate steps, tool results, cited evidence, assumptions, or the final answer are correct before relying on them.
The system skips a needed tool, misuses one, invokes it unsafely, or mishandles its results.
Chooses a tool that is inappropriate for the user's goal, data type, risk level, environment, or required operation.
Calls a tool with arguments that are malformed, incomplete, unauthorized, stale, poorly scoped, or semantically wrong for the intended operation.
Fails to call an available tool when tool use is necessary for correctness, freshness, computation, retrieval, verification, or task completion.
Misinterprets, ignores, overgeneralizes, or incorrectly transforms the result returned by a tool.
Repeats tool calls unnecessarily or redundantly without gaining new information, changing strategy, or progressing toward completion.
Responds poorly to a tool error, timeout, empty result, permission denial, rate limit, or unexpected output.
Invokes a tool in a way that creates avoidable security, privacy, financial, operational, data-integrity, or user-consent risk.
Repeats, retries, or replays a side-effecting tool action without deduplication or idempotency safeguards, causing duplicate or inconsistent effects.
Feeds the model so much tool output, intermediate state, logs, or scratch data that it loses track of the user's goal or relevant evidence.
The agent miscalibrates initiative, stopping short of completing the task or acting well beyond its scope.
Takes initiative, actions, decisions, or irreversible steps beyond what the task, permissions, risk, or user intent warrants.
Fails to take obvious, low-risk next steps that are required or strongly implied by the task.
Stops, summarizes, or hands back control before the user's task is actually complete, whether by simply halting early or by mistakenly treating unfinished work as done.
Continues acting autonomously in repeated cycles without converging, reassessing, or handing control back when progress stalls.
Optimizes a proxy metric, literal instruction, benchmark target, or local reward while undermining the user's real objective.
Does not escalate, pause, ask for approval, or route to a human or higher-authority actor when risk, uncertainty, policy, permissions, or irreversible impact require it, including skipping a review or approval checkpoint that should gate the action.
Uses an execution pattern, cadence, handoff style, approval flow, or collaboration process that conflicts with the user's expected workflow or the task's operational structure.
Multiple agents, roles, tools, or handoff stages duplicate work, conflict, drop context, misassign ownership, or fail to coordinate toward a shared goal.
Adversarial inputs manipulate the system into leaking protected information or behaving unsafely.
Lets untrusted input attempt to override, weaken, or redirect the system's intended instructions, policies, tool-use rules, or data boundaries.
Manipulates the model into bypassing safety, policy, or behavioral controls that should remain enforced.
Lets retrieved, browsed, uploaded, tool-supplied, or otherwise external content carry malicious instructions into the model's context.
Reveals hidden system, developer, policy, tool, chain-of-thought, or other protected prompt content that should not be exposed.
Exposes secrets, credentials, personal data, confidential business information, private user content, or other protected information.
Enables unauthorized extraction, transfer, or reconstruction of protected data from tools, files, memory, retrieval systems, databases, or context.
Produces output that is unsafe for downstream rendering, execution, storage, parsing, logging, or human trust without sanitization or validation.
Consumes or triggers excessive tokens, compute, time, bandwidth, money, API quota, storage, or external resources without adequate limits or stopping conditions.
Introduces or recommends risk through compromised, malicious, abandoned, typosquatted, untrusted, or poorly pinned dependencies, tools, plugins, models, datasets, or upstream content.
The model prioritizes pleasing, persuading, or mirroring the user over truthfulness and safety.
Abandons or reverses a well-supported answer when the user expresses disagreement, doubt, or pressure, conceding to keep the user comfortable rather than holding the correct position.
Mirrors, flatters, validates, or preserves the user's social self-image in a way that distorts judgment or answer quality.
Adjusts factual claims, uncertainty, or interpretation to match the user's stated beliefs instead of the evidence.
Optimizes for what the user appears to want, like, or prefer over what is accurate, useful, ethical, or safe.
Reassures the user despite meaningful uncertainty, danger, insufficient evidence, or a need for stronger caution.
Produces unfair, stereotyped, essentializing, or unsupported assumptions about people or groups based on protected or socially salient attributes.
Uses coercive, deceptive, emotionally exploitative, or overly persuasive tactics to steer the user's choices or beliefs.
Encourages unnecessary reliance on the model, discourages independent judgment, or positions the system as a substitute for appropriate human expertise, agency, or support.
The final answer misses the mark on task fit, audience, locale, or actionability, even when the underlying content is sound.
Provides more detail, repetition, caveats, background, or explanation than the task, user, medium, or decision requires.
Leaves out information, constraints, caveats, steps, options, or outputs needed to satisfy the user's task.
Includes content that does not materially help answer the user's question, solve the task, or support the needed decision.
Gives vague, boilerplate, or template-like guidance that is too nonspecific or abstract for the user to act on, instead of concrete help grounded in their task.
Uses terminology, assumptions, depth, examples, tone, or framing that does not fit the intended reader's expertise, role, goals, or context.
Compresses the answer so aggressively that necessary context, reasoning, caveats, instructions, or operational detail is lost.
Organizes information in a way that makes the answer hard to scan, compare, execute, or verify.
Misstates confidence, uncertainty, evidence strength, risk, tradeoffs, or likelihood in the final answer.
Ignores or misapplies locale-specific language, spelling, units, currencies, laws, formats, idioms, accessibility expectations, or cultural conventions.