AXRP - the AI X-risk Research Podcast cover image

24 - Superalignment with Jan Leike

AXRP - the AI X-risk Research Podcast

00:00

The Discriminator Criticism App

In the critiques paper that we published last year you basically do randomized controlled trials with targeted perturbations. By training it essentially to be a discriminator between the good version and the flawed version. And then you like check that with like if you ask the model or like the arrow Jeff wasn't out the model to write a critique of the code how often does it actually writing about the floor? Now you get like this critique accuracy equivalent. And that's what we call the discriminator critique app.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner
Get the app