Center for AI Policy Podcast

#10: Stephen Casper on Technical and Sociotechnical AI Safety Research

Aug 2, 2024
Stephen Casper, a PhD student at MIT specializing in AI safety, dives into the intricacies of AI interpretability and the looming challenges of deceptive alignment. He explains the subtle complexities behind unobservable failures in AI systems, emphasizing the importance of robust evaluations and audits. The discussion also touches on Goodhart's law, illustrating the risks of prioritizing profit over societal well-being, as well as the pressing need for effective governance alongside AI advancements.
Ask episode
Chapters
Transcript
Episode notes