Hear This Idea cover image

#66 – Michael Cohen on Input Tampering in Advanced RL Agents

Hear This Idea

00:00

The Argument for Reward Is Not the Optimization Target

There are a couple posts like the alignment forum or somewhere else. So one is called reward is not the optimization target. And then someone wrote a follow up called models don't quote unquote get reward. Alex Turner: The argument makes a ton of sense when I imagine RL agents as in some sense like wanting to get rewards over like as many time steps as possible. But maybe there's a sense in which this is actually just like not quite accurate.

Play episode from 01:21:39
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app