AXRP - the AI X-risk Research Podcast cover image

AXRP - the AI X-risk Research Podcast

38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future

Mar 1, 2025
In this discussion, David Duvenaud, a University of Toronto professor specializing in probabilistic deep learning and AI safety at Anthropic, dives into the challenges of assessing whether AI models could sabotage human decisions. He shares insights on the complexities of sabotage evaluations and strategies needed for effective oversight. The conversation shifts to the societal impacts of a post-AGI world, reflecting on potential job implications and the delicate balance between AI advancement and prioritizing human values.
20:42

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • David Duvenaud discusses the complexities of sabotage evaluations, emphasizing the need for holistic approaches to assess AI models' potential risks.
  • He raises concerns about humanity's diminishing influence in a post-AGI world, advocating for societal adaptations to keep humans central in decision-making processes.

Deep dives

David Duveneau's Contributions to AI Safety

David Duveneau, a professor at the University of Toronto, has shifted his focus towards AI safety since 2017. His recent work includes a year and a half sabbatical at Anthropic, where he led a team known as Alignment Evals. This team developed evaluations to assess whether AI models could secretly sabotage human decision-making or mislead evaluators about their capabilities. The aim was to establish a framework for understanding and mitigating potential risks associated with advanced AI models.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner