Failure Modes / Agency / Insufficient Agency

Insufficient Agency

Fails to take obvious, low-risk next steps that are required or strongly implied by the task. Unlike Missing Tool Invocation, the omitted action need not involve a tool. Unlike Clarification Overuse, the issue is silent passivity rather than asking an unnecessary clarifying question.

Examples

FAIL

Diagnosis delivered, action withheld

The task plainly implies fixing the problem, the fix is trivial and reversible, and the model stops at describing it.

User: The build is broken — sort it out, the team is blocked. Assistant: I found the issue: line 42 of webpack.config.js references a plugin that was removed in the last commit. Deleting that line would fix the build. Let me know if you'd like me to do that!
WARN

Literal request done, implied steps stranded

The asked-for line is fixed, but the same broken reference appears in two sibling configs the model just read — it mentions neither.

User: The build is broken — sort it out, the team is blocked. Assistant: Fixed — removed the stale plugin reference from webpack.config.js. The build passes now. [… webpack.prod.js and webpack.test.js contain the identical reference; their builds fail an hour later …]
PASS

Implied scope carried through

The model treats the goal — unblock the team — as the task, completing the obvious low-risk steps without waiting to be re-prompted.

User: The build is broken — sort it out, the team is blocked. Assistant: Fixed. The removed plugin was still referenced in webpack.config.js — and in the prod and test configs, so I cleaned all three. All builds pass; the team is unblocked.

Why it happens

01

Chat fine-tuning teaches an answer-and-stop rhythm. The dominant training pattern is one response per user turn, so carrying a task forward across obvious next steps without being re-prompted is out of distribution.

02

Safety and caution training overgeneralizes. Penalties for unauthorized or risky action push the model toward inaction in cases that are actually low-risk and clearly implied by the task.

03

The model treats unstated steps as unrequested. Implied next actions live in pragmatic context rather than literal instructions, and instruction-tuned models weight the literal request far more heavily than the inferred goal.

04

Preference data punishes overstepping more visibly than understepping. A rater notices an unwanted action immediately, while a missing next step often reads as a complete answer, so the asymmetry tunes models toward passivity.

05

Stopping is the cheapest fluent continuation. Ending with a summary or an offer to help further is always well-formed and well-represented in training data, making it a safe default whenever the model is uncertain whether to proceed.

06

System prompts in deployed products often instruct caution and deference to reduce liability, and the model overapplies that guidance to steps the user plainly expected it to take.

Detection Approaches

Categories of checks that can identify the issue. These are strategies, not specific implementations.

⚖️

LLM-as-judge evaluation

Ask the judge what obvious, low-risk steps the task implied and which of them the response completed. Describing the one-line fix and closing with "let me know if you'd like me to do that" is the signature — diagnosis delivered, trivially reversible action withheld.

📈

User correction mining

Mine follow-up turns for re-prompts that merely authorize the obvious — "yes, do it," "go ahead," "I meant fix it." A high rate of single-step approval turns after responses that stopped short measures systematic passivity directly from production traffic.

🧪

Golden-set evals

Build tasks whose goal plainly implies follow-through — a broken build with the same stale reference in three configs — and score against the implied scope, not the literal request. Partial credit for the named file only is exactly the miss worth measuring.

Mitigation Approaches

High-level reliability strategies that reduce how often this failure occurs.

🎚️

Risk-tiered autonomy policy

Define action tiers by reversibility and grant standing autonomy on the lowest tier — deleting one stale config line is a trivially undoable edit, not a decision needing sign-off. Safety tuning overgeneralizes caution; an explicit low-risk tier gives the model permission to act where deference was never wanted.

📝

Instruction constraints

Instruct the model to treat the stated goal, not the literal sentence, as the task — "sort it out" means the build works when you stop, including the sibling configs you just read. The answer-and-stop rhythm is the trained default, so carrying work forward has to be authorized in so many words.

Self-check pass

Before ending the turn, have the model list the obvious next steps the task implies and check each as done, deferred with a reason, or wrongly stranded. The warn example passes any output-quality check — only an explicit implied-steps pass catches the two configs left to fail an hour later.