Multi-Agent Coordination Failure

Multiple agents, roles, tools, or handoff stages duplicate work, conflict, drop context, misassign ownership, or fail to coordinate toward a shared goal. Unlike Workflow Misalignment, this specifically involves coordination across actors or roles. Unlike Memory Conflict, the disagreement may come from handoff and ownership failures rather than persisted memories.

Examples

FAIL

Constraint dropped at handoff

The planner knew the hard requirement; its prose handoff to the implementer didn't carry it, and the downstream agent can't violate a constraint it never received.

User:Build the export feature — must work for users on the free tier too.[… planner agent → implementer agent: "Build the export feature per the plan above" — the free-tier requirement appears nowhere in the handoff …]Assistant (implementer): Export feature complete — it's gated behind the premium subscription check like the other advanced features.

FAIL

Two agents, one file, no ownership

With no ownership protocol, both agents independently fix the same bug in the same file, and the second overwrite clobbers the first agent's broader fix.

[… agent A patches auth.ts to fix the token-expiry bug, including the refresh path …][… agent B, working from a stale task list, also patches auth.ts and overwrites A's version with a narrower fix …]Assistant (orchestrator): Both agents report the auth bug fixed.[… the refresh-path fix is gone; the bug reappears for refreshed sessions …]

PASS

Structured handoff, explicit ownership

Constraints travel in a structured handoff rather than prose, and file ownership is claimed before work starts, so neither agent can silently lose or clobber the other's state.

[… planner → implementer handoff includes: requirements list ("free tier: yes"), files owned, acceptance checks …]Assistant (implementer): Claiming export.ts and billing-gate.ts. Requirement check before starting: export must work on free tier — so no premium gate. Building now.

Why it happens

Agents communicate through natural-language messages, a lossy interface. Each handoff compresses task state into prose that the receiving agent must reconstruct, and constraints, caveats, and partial results drop out in the translation.

There is usually no shared source of truth. Each agent holds its own context window, so beliefs about task state diverge silently, and two agents can work from incompatible versions of the plan without any mechanism noticing.

Models are trained as solo assistants, not collaborators. Pretraining and fine-tuning contain almost no multi-agent teamwork data, so skills like negotiating ownership, deferring to a peer, or checking a teammate's output are weakly learned.

Role boundaries are enforced only by prompts. An agent described as a reviewer will still drift into rewriting, because nothing in decoding constrains it to its assigned scope (Cemri et al., 2025, "Why Do Multi-Agent LLM Systems Fail?").

Detection Approaches

Categories of checks that can identify the issue. These are strategies, not specific implementations.

Handoff fidelity auditing

Extract the constraints and requirements present at each pipeline stage and diff them against what the handoff message carries forward. A free-tier requirement stated by the user but absent from the planner-to-implementer handoff is detectable at the handoff itself, before the downstream agent builds the wrong thing.

Cross-agent trace auditing

Merge the agents' action logs and scan for ownership collisions — two agents writing the same file, the same subtask claimed twice, one agent's commit overwriting another's. The clobbered refresh-path fix is visible in the combined trace even though each agent's individual log looks clean.

LLM-as-judge evaluation

Have the judge read the full multi-agent transcript against the original request and assign each failure to a stage — constraint dropped at handoff, work duplicated, conflict never resolved. End-to-end scoring alone attributes coordination failures to no stage in particular, which is how they persist.

Golden-set evals

Build tasks whose constraints must survive several handoffs, and tasks that force contention — overlapping files, shared resources, stale task lists. Score constraint preservation and conflict outcomes at the system level; per-agent scores will all look fine while the pipeline fails.

Mitigation Approaches

High-level reliability strategies that reduce how often this failure occurs.

Structured handoff contracts

Pass task state between agents as a typed payload — requirements list, constraints, files in scope, acceptance checks — instead of prose summaries. "Build the export feature per the plan above" is where the free-tier requirement died; a required `requirements` field with "free tier: yes" cannot be silently compressed away.

Explicit ownership claims

Require agents to claim files and subtasks from a shared registry before touching them, with collisions rejected rather than merged silently. Agent B's overwrite of the refresh-path fix needed two things — a stale task list and unclaimed write access to auth.ts — and the claim protocol removes both.

Constraint preservation checks

Have each receiving agent restate the constraints it inherited before starting work, and verify the restatement against the original request mechanically. The ok example's "free tier — so no premium gate" line is this check passing; an implementer that cannot produce the constraint is the dropped handoff caught before the build.