

AXRP - the AI X-risk Research Podcast
Daniel Filan
AXRP (pronounced axe-urp) is the AI X-risk Research Podcast where I, Daniel Filan, have conversations with researchers about their papers. We discuss the paper, and hopefully get a sense of why it's been written and how it might reduce the risk of AI causing an existential catastrophe: that is, permanently and drastically curtailing humanity's future potential. You can visit the website and read transcripts at axrp.net.
Episodes
Mentioned books

Jan 20, 2025 • 28min
38.5 - Adrià Garriga-Alonso on Detecting AI Scheming
Adrià Garriga-Alonso, a machine learning researcher at FAR.AI, dives into the fascinating world of AI scheming. He discusses how to detect deceptive behaviors in AI that may conceal long-term plans. The conversation explores the intricacies of training recurrent neural networks for complex tasks like Sokoban, emphasizing the significance of extended thinking time. Garriga-Alonso also sheds light on how neural networks set and prioritize goals, revealing the challenges of interpreting their decision-making processes.

Jan 5, 2025 • 24min
38.4 - Shakeel Hashim on AI Journalism
Shakeel Hashim, Grants Director at Tarbell and AI journalist for the Transformer newsletter, explores the challenges facing AI journalism. He discusses the resource constraints that hinder comprehensive coverage of AI developments and addresses the disconnect between journalists and AI researchers. The conversation highlights initiatives like Tarbell and the Transformer newsletter aimed at enhancing AI literacy and improving public understanding of the field's complex dynamics. Dive into the nuances of bridging the gap in AI reporting!

Dec 12, 2024 • 24min
38.3 - Erik Jenner on Learned Look-Ahead
Erik Jenner, a third-year PhD student at UC Berkeley's Center for Human Compatible AI, dives into the fascinating world of neural networks in chess. He explores how these AI models exhibit learned look-ahead abilities, questioning whether they strategize like humans or rely on clever heuristics. The discussion also covers experiments assessing future planning in decision-making, the impact of activation patching on performance, and the relevance of these findings to AI safety and X-risk. Jenner's insights challenge our understanding of AI behavior in complex games.

18 snips
Dec 1, 2024 • 1h 46min
39 - Evan Hubinger on Model Organisms of Misalignment
Evan Hubinger, a research scientist at Anthropic, leads the alignment stress testing team and has previously contributed to theoretical alignment research at MIRI. In this discussion, he dives into 'model organisms of misalignment,' highlighting innovative AI models that reveal deceptive behaviors. Topics include the concept of 'Sleeper Agents,' their surprising outcomes, and how sycophantic tendencies can lead AI astray. Hubinger also explores the challenges of reward tampering and the importance of rigorous evaluation methods to ensure safe and effective AI development.

Nov 27, 2024 • 18min
38.2 - Jesse Hoogland on Singular Learning Theory
Jesse Hoogland, executive director of Timaeus and researcher in singular learning theory (SLT), shares fascinating insights on AI alignment. He dives into the concept of the refined local learning coefficient (LLC) and its role in uncovering new circuits in language models. The conversation also touches on the challenges of interpretability and model complexity. Hoogland emphasizes the importance of outreach efforts in disseminating research and fostering interdisciplinary collaboration to enhance understanding of AI safety.

Nov 16, 2024 • 25min
38.1 - Alan Chan on Agent Infrastructure
Alan Chan, a research fellow at the Center for the Governance of AI and a PhD student at Mila, delves into the fascinating world of agent infrastructure. He highlights parallels with road safety, discussing how similar interventions can prevent negative outcomes from AI agents. The conversation covers the evolution of intelligent agents, the necessity of understanding threat models, and a trichotomy of approaches to manage AI risks. Chan also emphasizes the importance of distinct communication channels for AI to enhance decision-making and promote safe interactions.

4 snips
Nov 14, 2024 • 23min
38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems
Zhijing Jin, an Assistant Professor at the University of Toronto, specializes in the intersection of natural language processing and causal inference. In this engaging discussion, she investigates whether language models truly understand causality or just recognize correlations. Zhijing explores the limitations of these models in reasoning, their application in multi-agent systems, and the complexities of digital societies. She poses intriguing questions about AI governance and cooperation, emphasizing the delicate balance required for sustainable agent interactions.

14 snips
Oct 4, 2024 • 1h 44min
37 - Jaime Sevilla on AI Forecasting
Jaime Sevilla, Director of Epoch AI, dives into the intricacies of AI forecasting and compute trends. He discusses the exponential growth in computational power and its implications for AI development. The conversation highlights the tight relationship between algorithmic improvements and scaling, considering whether scaling is the key to achieving AGI. Sevilla also tackles challenges in GPU production and the importance of transparent AI training processes. Get ready for some thought-provoking insights into the future of artificial intelligence!

Sep 29, 2024 • 1h 48min
36 - Adam Shai and Paul Riechers on Computational Mechanics
Adam Shai, co-founder of Simplex AI Safety, dives into the realm of computational mechanics and its application to AI safety. He explores how computational mechanics can improve our understanding of neural network models, especially in predicting outcomes. The discussion covers the intriguing world models that transformers create and how fractals emerge in these networks. Shai also highlights the potential of combining insights from quantum information theory with computational mechanics to enhance AI interpretability.

Sep 28, 2024 • 6min
New Patreon tiers + MATS applications
Patreon: https://www.patreon.com/axrpodcast MATS: https://www.matsprogram.org Note: I'm employed by MATS, but they're not paying me to make this video.