The Argument for Reward Is Not the Optimization Target

There are a couple posts like the alignment forum or somewhere else. So one is called reward is not the optimization target. And then someone wrote a follow up called models don't quote unquote get reward. Alex Turner: The argument makes a ton of sense when I imagine RL agents as in some sense like wanting to get rewards over like as many time steps as possible. But maybe there's a sense in which this is actually just like not quite accurate.

Play episode from 01:21:39

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app