AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Using a Linguade Model to Detect a Deficiency Problem
This is applying the strength of a languade model to try to trick another languad model into making a mistake that is caught by that classifier. We start out with fuchup withwit with few shot prompting, which means you should v showed the t language model that the attacker, the red team language model. And then we also use supervise fin toning and ar fin toning to get a stronger attacking model. It's cald red teeming language models with language models. O first authors, even press, and then there ar bunch of authors, and you're the last author yet. So wit that out of the way, i guess let's talk about some specific papers