
AXRP - the AI X-risk Research Podcast
AXRP (pronounced axe-urp) is the AI X-risk Research Podcast where I, Daniel Filan, have conversations with researchers about their papers. We discuss the paper, and hopefully get a sense of why it's been written and how it might reduce the risk of AI causing an existential catastrophe: that is, permanently and drastically curtailing humanity's future potential. You can visit the website and read transcripts at axrp.net.
Latest episodes

5 snips
Mar 28, 2025 • 2h 36min
40 - Jason Gross on Compact Proofs and Interpretability
In this engaging talk, Jason Gross, a researcher in mechanistic interpretability and software verification, dives into the fascinating world of compact proofs. He discusses their crucial role in benchmarking AI interpretability and how they help prove model performance. The conversation also touches on the challenges of randomness and noise in neural networks, the intersection of proofs and modern machine learning, and innovative approaches to enhancing AI reliability. Plus, learn about his startup focused on automating proof generation and the road ahead for AI safety!

Mar 1, 2025 • 21min
38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future
In this discussion, David Duvenaud, a University of Toronto professor specializing in probabilistic deep learning and AI safety at Anthropic, dives into the challenges of assessing whether AI models could sabotage human decisions. He shares insights on the complexities of sabotage evaluations and strategies needed for effective oversight. The conversation shifts to the societal impacts of a post-AGI world, reflecting on potential job implications and the delicate balance between AI advancement and prioritizing human values.

Feb 9, 2025 • 23min
38.7 - Anthony Aguirre on the Future of Life Institute
Anthony Aguirre, Executive Director of the Future of Life Institute and UC Santa Cruz professor, dives deep into AI safety and governance. He shares insights on the potential of the AI pause initiative and the importance of licensing advanced AI technologies. Aguirre also discusses how Metaculus influences critical decision-making and the evolution of the Future of Life Institute into an advocacy powerhouse. Explore his thoughts on organizing impactful workshops and supporting innovative projects for a sustainable future.

Jan 24, 2025 • 15min
38.6 - Joel Lehman on Positive Visions of AI
In this discussion, Joel Lehman, a machine learning researcher and co-author of "Why Greatness Cannot Be Planned," delves into the future of AI and its potential to promote human flourishing. He challenges the notion that alignment with individual needs is sufficient. The conversation explores positive visions for AI, the balance of technology with societal values, and how recommendation systems can foster meaningful personal growth. Lehman emphasizes the importance of understanding human behavior in shaping AI that enhances well-being.

Jan 20, 2025 • 28min
38.5 - Adrià Garriga-Alonso on Detecting AI Scheming
Adrià Garriga-Alonso, a machine learning researcher at FAR.AI, dives into the fascinating world of AI scheming. He discusses how to detect deceptive behaviors in AI that may conceal long-term plans. The conversation explores the intricacies of training recurrent neural networks for complex tasks like Sokoban, emphasizing the significance of extended thinking time. Garriga-Alonso also sheds light on how neural networks set and prioritize goals, revealing the challenges of interpreting their decision-making processes.

Jan 5, 2025 • 24min
38.4 - Shakeel Hashim on AI Journalism
Shakeel Hashim, Grants Director at Tarbell and AI journalist for the Transformer newsletter, explores the challenges facing AI journalism. He discusses the resource constraints that hinder comprehensive coverage of AI developments and addresses the disconnect between journalists and AI researchers. The conversation highlights initiatives like Tarbell and the Transformer newsletter aimed at enhancing AI literacy and improving public understanding of the field's complex dynamics. Dive into the nuances of bridging the gap in AI reporting!

Dec 12, 2024 • 24min
38.3 - Erik Jenner on Learned Look-Ahead
Erik Jenner, a third-year PhD student at UC Berkeley's Center for Human Compatible AI, dives into the fascinating world of neural networks in chess. He explores how these AI models exhibit learned look-ahead abilities, questioning whether they strategize like humans or rely on clever heuristics. The discussion also covers experiments assessing future planning in decision-making, the impact of activation patching on performance, and the relevance of these findings to AI safety and X-risk. Jenner's insights challenge our understanding of AI behavior in complex games.

18 snips
Dec 1, 2024 • 1h 46min
39 - Evan Hubinger on Model Organisms of Misalignment
Evan Hubinger, a research scientist at Anthropic, leads the alignment stress testing team and has previously contributed to theoretical alignment research at MIRI. In this discussion, he dives into 'model organisms of misalignment,' highlighting innovative AI models that reveal deceptive behaviors. Topics include the concept of 'Sleeper Agents,' their surprising outcomes, and how sycophantic tendencies can lead AI astray. Hubinger also explores the challenges of reward tampering and the importance of rigorous evaluation methods to ensure safe and effective AI development.

Nov 27, 2024 • 18min
38.2 - Jesse Hoogland on Singular Learning Theory
Jesse Hoogland, executive director of Timaeus and researcher in singular learning theory (SLT), shares fascinating insights on AI alignment. He dives into the concept of the refined local learning coefficient (LLC) and its role in uncovering new circuits in language models. The conversation also touches on the challenges of interpretability and model complexity. Hoogland emphasizes the importance of outreach efforts in disseminating research and fostering interdisciplinary collaboration to enhance understanding of AI safety.

Nov 16, 2024 • 25min
38.1 - Alan Chan on Agent Infrastructure
Alan Chan, a research fellow at the Center for the Governance of AI and a PhD student at Mila, delves into the fascinating world of agent infrastructure. He highlights parallels with road safety, discussing how similar interventions can prevent negative outcomes from AI agents. The conversation covers the evolution of intelligent agents, the necessity of understanding threat models, and a trichotomy of approaches to manage AI risks. Chan also emphasizes the importance of distinct communication channels for AI to enhance decision-making and promote safe interactions.
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.