AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Generalization and Reward Models in Machine Learning
The prompt is, is guiding the model. It's like what corner of the internet do we want to imitate here? And maybe we want to instruct you. So I think generalization, yeah, I think language models generalize quite well. One of the tricky pieces about, uh, RL from human feedback is how, so, so you have this reward model and you're actually training against it.