
16 - Preparing for Debate AI with Geoffrey Irving
AXRP - the AI X-risk Research Podcast
Is There a Better Detector Quality for Detecting Failures?
The idea is you have some mechanism that's imperfect for detecting problem. And so we're just going to throw a bunch of computation power at the problem. So thats both of the form like, generate a lot of samples and just see if theyfth if they work ifi they cause failures. Also use the strength of a language model to probe likely failure points. It relies on on their being a detector which doesn't have too many negatives.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.