
38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future
AXRP - the AI X-risk Research Podcast
Evaluating AI Sabotage Risks
This chapter explores the complexities of assessing advanced AI models for potential sabotage of human decision-making. It discusses the challenges in designing evaluations that effectively measure AI behavior, including the need for comprehensive oversight and understanding vulnerabilities. The conversation emphasizes strategies for mitigating risks, highlighting three key areas of evaluation to ensure robust AI deployment.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.