The Inside View cover image

Ethan Perez–Inverse Scaling, Language Feedback, Red Teaming

The Inside View

00:00

A, Aral From Human Feedback

Aral from human feedback is basically a way to train models based on human feedback. It could be like a language model where you can sample different continuations. You use that as a signal to do reinforcement learning to optinize the output to get high reward a. And then from there, you can use that, they call it like a reward model, to predict the reward of a given output from from the model.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app