AXRP - the AI X-risk Research Podcast cover image

AXRP - the AI X-risk Research Podcast

38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future

Mar 1, 2025
In this discussion, David Duvenaud, a University of Toronto professor specializing in probabilistic deep learning and AI safety at Anthropic, dives into the challenges of assessing whether AI models could sabotage human decisions. He shares insights on the complexities of sabotage evaluations and strategies needed for effective oversight. The conversation shifts to the societal impacts of a post-AGI world, reflecting on potential job implications and the delicate balance between AI advancement and prioritizing human values.
20:42

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • David Duvenaud discusses the complexities of sabotage evaluations, emphasizing the need for holistic approaches to assess AI models' potential risks.
  • He raises concerns about humanity's diminishing influence in a post-AGI world, advocating for societal adaptations to keep humans central in decision-making processes.

Deep dives

David Duveneau's Contributions to AI Safety

David Duveneau, a professor at the University of Toronto, has shifted his focus towards AI safety since 2017. His recent work includes a year and a half sabbatical at Anthropic, where he led a team known as Alignment Evals. This team developed evaluations to assess whether AI models could secretly sabotage human decision-making or mislead evaluators about their capabilities. The aim was to establish a framework for understanding and mitigating potential risks associated with advanced AI models.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode