
Reward Mismatches in RL Cause Emergent Misalignment
Don't Worry About the Vase Podcast
00:00
Dilution Effect and Practical Hope
Zvi notes that mixing non-hackable tasks reduces misalignment and that this yields practical improvements.
Play episode from 10:13
Transcript


