Is There a Reward Signal for Using Prompt Engineering?

The conoiseur shop prompting is good at getting divers examples. But it's limited in terms of, like, as i mention, you get something like three point six % of examples, or r harmful from the model. For any even safer model, which is what we want to get to, you might need like one in a thousand, or wone in ten thousand examples to generate some example of harmful behavior. And so then you want more targeted methods for finding, for doing this like adversarial attack. You want like an exploration specifically to try to elicit this harmful content. So that's basely the goal of this arl approach,. where you you take this language model that's

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app