Diagnosing Task Insensitivity in Language Agents

Large language models can serve as capable long-horizon agents, but their out-of-distribution (OOD) generalization remains weak. We identify a key source of this failure as task insensitivity: when faced with similar but distinct tasks, models might apply patterns learned during training and fail to solve the task at hand.

We show that models often continue with actions aligned with the original task even when the instruction is semantically corrupted and cannot be directly answered. We further find that, when we replace the task description in a trained prompt with another similar but distinct task, the model may still output the same action.

This behavior is accompanied by a consistent training-time attention drift away from task tokens and toward local observations, suggesting an optimization bias toward shortcuts. To mitigate this problem, we propose Task-Perturbed NLL Optimization, a lightweight contrastive regularizer that explicitly encourages action dependence on the task instruction.

Extensive evaluations show that our intervention improves task sensitivity and OOD generalization while preserving more stable attention to task tokens.

Diagnosing Task Insensitivity in Language Agents

Authors

Abstract

Resources

Stay in the loop

Pages

Tools

Details