The Inside View cover image

Ethan Perez–Inverse Scaling, Language Feedback, Red Teaming

The Inside View

00:00

Is There a Reward Signal for Using Prompt Engineering?

The conoiseur shop prompting is good at getting divers examples. But it's limited in terms of, like, as i mention, you get something like three point six % of examples, or r harmful from the model. For any even safer model, which is what we want to get to, you might need like one in a thousand, or wone in ten thousand examples to generate some example of harmful behavior. And so then you want more targeted methods for finding, for doing this like adversarial attack. You want like an exploration specifically to try to elicit this harmful content. So that's basely the goal of this arl approach,. where you you take this language model that's

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app