LessWrong (30+ Karma)

“Steering Evaluation-Aware Models to Act Like They Are Deployed” by Tim Hua, andrq, Sam Marks, Neel Nanda

Oct 30, 2025
Ask episode
Chapters
Transcript
Episode notes