AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How Do You Train a Reward Model?
We hire a set of contractors to label data for us. And we essentially do an extra fine tuning stage on top of the normal language modeling language model pre training stage. That involves three steps, which I think we'll get into a bit. But essentially the goal is to use reinforcement learning to try to produce outcomes that are closer to the outputs that human would prefer for rank highly.