AXRP - the AI X-risk Research Podcast cover image

16 - Preparing for Debate AI with Geoffrey Irving

AXRP - the AI X-risk Research Podcast

CHAPTER

Is There a Better Detector Quality for Detecting Failures?

The idea is you have some mechanism that's imperfect for detecting problem. And so we're just going to throw a bunch of computation power at the problem. So thats both of the form like, generate a lot of samples and just see if theyfth if they work ifi they cause failures. Also use the strength of a language model to probe likely failure points. It relies on on their being a detector which doesn't have too many negatives.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner