AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Recursive Reward Modeling - What's the Difference?
In imitative application, we have this amplication operator, mp m. This is how we produce a more powerful version of the model. In recursively reward modeling, we have an Amplification Operator i. But what's the difference between trying to infer from human preferences and reard muddling? O chris: It's not our word function, because we learn it. So a word model is a learned model. Rit, so he's like a model. Ta, when you're trying to do re you else wo have like a model of the gard function. Teora differentwile the word model is not. We are all in on each other here.