Recursive Reward Modeling - What's the Difference?

In imitative application, we have this amplication operator, mp m. This is how we produce a more powerful version of the model. In recursively reward modeling, we have an Amplification Operator i. But what's the difference between trying to infer from human preferences and reard muddling? O chris: It's not our word function, because we learn it. So a word model is a learned model. Rit, so he's like a model. Ta, when you're trying to do re you else wo have like a model of the gard function. Teora differentwile the word model is not. We are all in on each other here.

Play episode from 01:34:50

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app