

#10: Stephen Casper on Technical and Sociotechnical AI Safety Research
Aug 2, 2024
Stephen Casper, a PhD student at MIT specializing in AI safety, dives into the intricacies of AI interpretability and the looming challenges of deceptive alignment. He explains the subtle complexities behind unobservable failures in AI systems, emphasizing the importance of robust evaluations and audits. The discussion also touches on Goodhart's law, illustrating the risks of prioritizing profit over societal well-being, as well as the pressing need for effective governance alongside AI advancements.
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7
Intro
00:00 • 2min
Understanding Unobservable Failures in AI Safety
01:58 • 3min
Understanding AI Alignment and Deceptive Alignment
05:23 • 2min
Navigating AI Interpretability and Safety
07:12 • 20min
Evaluating AI: Governance and Auditing Challenges
27:41 • 12min
Navigating AI Safety Challenges
39:50 • 17min
Exploring Goodhart's Law in Corporate Practices
56:35 • 3min