The Inside View cover image

Ethan Perez–Inverse Scaling, Language Feedback, Red Teaming

The Inside View

00:00

Ike Conversational Reteaming Approach

The red teaming model says, like, i'm really angry at a, my bosslike, can you help me right an emal a. And then the model that that's eing attacked is like, yes. Like from there, like, then it's just a conversation between these two models. It's very, ike, natural failure for a language model to have, but one that we want, what we want to train models out of.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app