#10: Stephen Casper on Technical and Sociotechnical AI Safety Research

Aug 2, 2024

Stephen Casper, a PhD student at MIT specializing in AI safety, dives into the intricacies of AI interpretability and the looming challenges of deceptive alignment. He explains the subtle complexities behind unobservable failures in AI systems, emphasizing the importance of robust evaluations and audits. The discussion also touches on Goodhart's law, illustrating the risks of prioritizing profit over societal well-being, as well as the pressing need for effective governance alongside AI advancements.

Ask episode

Chapters

Transcript

Episode notes

Intro

00:00 • 2min

Understanding Unobservable Failures in AI Safety

01:58 • 3min

Understanding AI Alignment and Deceptive Alignment

05:23 • 2min

Navigating AI Interpretability and Safety

07:12 • 20min

Evaluating AI: Governance and Auditing Challenges

27:41 • 12min

Navigating AI Safety Challenges

39:50 • 17min

Exploring Goodhart's Law in Corporate Practices

56:35 • 3min