AXRP - the AI X-risk Research Podcast cover image

16 - Preparing for Debate AI with Geoffrey Irving

AXRP - the AI X-risk Research Podcast

00:00

Is There a Better Detector Quality for Detecting Failures?

The idea is you have some mechanism that's imperfect for detecting problem. And so we're just going to throw a bunch of computation power at the problem. So thats both of the form like, generate a lot of samples and just see if theyfth if they work ifi they cause failures. Also use the strength of a language model to probe likely failure points. It relies on on their being a detector which doesn't have too many negatives.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app