
AXRP - the AI X-risk Research Podcast
AXRP (pronounced axe-urp) is the AI X-risk Research Podcast where I, Daniel Filan, have conversations with researchers about their papers. We discuss the paper, and hopefully get a sense of why it's been written and how it might reduce the risk of AI causing an existential catastrophe: that is, permanently and drastically curtailing humanity's future potential. You can visit the website and read transcripts at axrp.net.
Latest episodes

4 snips
Nov 14, 2024 • 23min
38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems
Zhijing Jin, an Assistant Professor at the University of Toronto, specializes in the intersection of natural language processing and causal inference. In this engaging discussion, she investigates whether language models truly understand causality or just recognize correlations. Zhijing explores the limitations of these models in reasoning, their application in multi-agent systems, and the complexities of digital societies. She poses intriguing questions about AI governance and cooperation, emphasizing the delicate balance required for sustainable agent interactions.

14 snips
Oct 4, 2024 • 1h 44min
37 - Jaime Sevilla on AI Forecasting
Jaime Sevilla, Director of Epoch AI, dives into the intricacies of AI forecasting and compute trends. He discusses the exponential growth in computational power and its implications for AI development. The conversation highlights the tight relationship between algorithmic improvements and scaling, considering whether scaling is the key to achieving AGI. Sevilla also tackles challenges in GPU production and the importance of transparent AI training processes. Get ready for some thought-provoking insights into the future of artificial intelligence!

Sep 29, 2024 • 1h 48min
36 - Adam Shai and Paul Riechers on Computational Mechanics
Adam Shai, co-founder of Simplex AI Safety, dives into the realm of computational mechanics and its application to AI safety. He explores how computational mechanics can improve our understanding of neural network models, especially in predicting outcomes. The discussion covers the intriguing world models that transformers create and how fractals emerge in these networks. Shai also highlights the potential of combining insights from quantum information theory with computational mechanics to enhance AI interpretability.

Sep 28, 2024 • 6min
New Patreon tiers + MATS applications
Patreon: https://www.patreon.com/axrpodcast MATS: https://www.matsprogram.org Note: I'm employed by MATS, but they're not paying me to make this video.

28 snips
Aug 24, 2024 • 2h 17min
35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
In this discussion, Peter Hase, a researcher specializing in large language models, dives into the intriguing world of AI beliefs. He explores whether LLMs truly have beliefs and how to detect and edit them. A key focus is on the complexities of interpreting neural representations and the implications of belief localization. The conversation also covers the concept of easy-to-hard generalization, revealing insights on how AI tackles different task difficulties. Join Peter as he navigates these thought-provoking topics, blending philosophy with practical AI research.

43 snips
Jul 28, 2024 • 2h 14min
34 - AI Evaluations with Beth Barnes
Beth Barnes, the founder and head of research at METR, dives into the complexities of evaluating AI systems. They discuss tailored threat models and the unpredictability of AI performance, stressing the need for precise assessment methodologies. Barnes highlights issues like sandbagging and behavior misrepresentation, emphasizing the importance of ethical considerations in AI evaluations. The conversation also touches on the role of policy in shaping effective evaluation science, as well as the disparities between different AI labs in security and monitoring.

5 snips
Jun 12, 2024 • 1h 41min
33 - RLHF Problems with Scott Emmons
Expert Scott Emmons discusses challenges in Reinforcement Learning from Human Feedback (RLHF): deceptive inflation, overjustification, bounded human rationality, and solutions. Touches on dimensional analysis and his research program, emphasizing the importance of addressing these challenges in AI systems.

May 30, 2024 • 2h 22min
32 - Understanding Agency with Jan Kulveit
Jan Kulveit, who leads the Alignment of Complex Systems research group, dives into the fascinating intersection of AI and human cognition. He discusses active inference, the differences between large language models and the human brain, and how feedback loops influence behavior. The conversation explores hierarchical agency, the complexities of aligning AI with human values, and the philosophical implications of self-awareness in AI. Kulveit also critiques existing frameworks for understanding agency, shedding light on the dynamics of collective behaviors.

May 7, 2024 • 2h 32min
31 - Singular Learning Theory with Daniel Murfet
Daniel Murfet, a researcher specializing in singular learning theory and Bayesian statistics, dives into the intricacies of deep learning models. He explains how singular learning theory enhances our understanding of learning dynamics and phase transitions in neural networks. The conversation explores local learning coefficients, their impact on model accuracy, and how singular learning theory compares with other frameworks. Murfet also discusses the potential for this theory to contribute to AI alignment, emphasizing interpretability and the challenges of integrating AI capabilities with human values.

25 snips
Apr 30, 2024 • 2h 16min
30 - AI Security with Jeffrey Ladish
AI security expert Jeffrey Ladish discusses the robustness of safety training in AI models, dangers of open LLMs, securing against attackers, and the state of computer security. They explore undoing safety filters, AI phishing, and making AI more legible. Topics include securing model weights, defending against AI exfiltration, and red lines in AI development.
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.