AXRP - the AI X-risk Research Podcast cover image

24 - Superalignment with Jan Leike

AXRP - the AI X-risk Research Podcast

00:00

The Discriminator Criticism App

In the critiques paper that we published last year you basically do randomized controlled trials with targeted perturbations. By training it essentially to be a discriminator between the good version and the flawed version. And then you like check that with like if you ask the model or like the arrow Jeff wasn't out the model to write a critique of the code how often does it actually writing about the floor? Now you get like this critique accuracy equivalent. And that's what we call the discriminator critique app.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app