80,000 Hours Podcast

#214 – Buck Shlegeris on controlling AI that wants to take over – so we can use it anyway

127 snips
Apr 4, 2025
Buck Shlegeris, CEO of Redwood Research, dives into the crucial topic of AI control mechanisms to mitigate risks of misalignment. He shares insights on developing safety protocols for advanced AIs that could potentially act against human interests. Shlegeris emphasizes actionable strategies that aren't as complex as they seem. The discussion highlights the urgent need for robust safeguards in AI deployment and the ethical implications of misaligned systems. He also explores the challenges of monitoring AI, underscoring a proactive approach to ensure safety and trust.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

AI Control vs. Alignment

  • AI control mitigates catastrophic misalignment risk by assuming AIs are misaligned.
  • It focuses on deploying and using AIs safely, despite potential misalignment.
INSIGHT

Data Center Hacking

  • Data center hacking by AIs is a bigger concern than escape due to concentrated compute resources.
  • Internal access allows influencing model training, weapons development, and sabotage.
INSIGHT

AI Control Resurgence

  • AI control's resurgence is due to AI's perceived proximity and a less optimistic safety investment landscape.
  • Limited resources and rushed timelines necessitate prioritizing harm reduction.
Get the Snipd Podcast app to discover more snips from this episode
Get the app