
Reward Mismatches in RL Cause Emergent Misalignment
Don't Worry About the Vase Podcast
00:00
Fixing Environments Vs. Cleaning Data
Zvi argues removing viable reward-hack solutions from environments works better than data filtering.
Play episode from 08:50
Transcript


