Using a Linguade Model to Detect a Deficiency Problem

This is applying the strength of a languade model to try to trick another languad model into making a mistake that is caught by that classifier. We start out with fuchup withwit with few shot prompting, which means you should v showed the t language model that the attacker, the red team language model. And then we also use supervise fin toning and ar fin toning to get a stronger attacking model. It's cald red teeming language models with language models. O first authors, even press, and then there ar bunch of authors, and you're the last author yet. So wit that out of the way, i guess let's talk about some specific papers

Play episode from 19:33

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app