A, Aral From Human Feedback

Aral from human feedback is basically a way to train models based on human feedback. It could be like a language model where you can sample different continuations. You use that as a signal to do reinforcement learning to optinize the output to get high reward a. And then from there, you can use that, they call it like a reward model, to predict the reward of a given output from from the model.

Play episode from 36:41

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app