LessWrong (30+ Karma)

“Sonnet 4.5’s eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals” by Alexa Pan, ryan_greenblatt

Oct 30, 2025
Ask episode
Chapters
Transcript
Episode notes