The Inside View cover image

Ethan Perez–Inverse Scaling, Language Feedback, Red Teaming

The Inside View

00:00

How to Do a Back Flip in a Language Model?

Like a language model, you can think of the am each token or like word as an action and a. And the kind of reward you get is depend on how good your output is compared to the other one. So imagine you have ten different outputs for a prompt a. The human levele levelers will ik compare differentprompts ory comare different Outputs. Then if your output was always compared as like the best one, it will get like a high reward or a high core for the lebers. Yea, that's, that's roughly. I think it can be done in different ways.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app