LessWrong (30+ Karma)

“Reward Mismatches in RL Cause Emergent Misalignment” by Zvi

Dec 2, 2025
Ask episode
Chapters
Transcript
Episode notes