The Inside View cover image

3. Evan Hubinger on Takeoff speeds, Risks from learned optimization & Interpretability

The Inside View

00:00

Recursive Reward Modeling - What's the Difference?

In imitative application, we have this amplication operator, mp m. This is how we produce a more powerful version of the model. In recursively reward modeling, we have an Amplification Operator i. But what's the difference between trying to infer from human preferences and reard muddling? O chris: It's not our word function, because we learn it. So a word model is a learned model. Rit, so he's like a model. Ta, when you're trying to do re you else wo have like a model of the gard function. Teora differentwile the word model is not. We are all in on each other here.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app