LessWrong (30+ Karma)

“Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior” by Sam Marks

Oct 8, 2025
Ask episode
Chapters
Transcript
Episode notes