80,000 Hours Podcast

#226 – Holden Karnofsky on unexploited opportunities to make AI safer — and all his AGI takes

250 snips
Oct 30, 2025
Holden Karnofsky is the co-founder of GiveWell and Open Philanthropy and currently advises on AI risk at Anthropic. He shares exciting, actionable projects in AI safety, emphasizing the shift from theory to hands-on work. Topics include training AI to detect deception, implementing security against backdoors, and promoting model welfare. Holden discusses how AI companies can foster positive AGI development and offers insight into career paths in AI safety, urging listeners to recognize their potential impact.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

We’re Creating A Second Species

  • Creating a second species (AGI) is unprecedented and terrifying if misaligned.
  • Holden argues many actors won't slow down, so pausing alone won’t prevent dangerous progress.
INSIGHT

Cheap Safety Measures Are Underused

  • Many cheap, practical safety measures exist that firms could adopt without losing competitiveness.
  • Making them standard is tractable and could materially reduce risk even absent heavy regulation.
ADVICE

Keep Investigative Logs, Not Zero Retention

  • Preserve some conversation and interaction logs securely to investigate suspicious incidents.
  • Implement middle-ground retention policies (short-term secure storage, opt-in sharing) to balance privacy and forensic needs.
Get the Snipd Podcast app to discover more snips from this episode
Get the app