

AXRP - the AI X-risk Research Podcast
Daniel Filan
AXRP (pronounced axe-urp) is the AI X-risk Research Podcast where I, Daniel Filan, have conversations with researchers about their papers. We discuss the paper, and hopefully get a sense of why it's been written and how it might reduce the risk of AI causing an existential catastrophe: that is, permanently and drastically curtailing humanity's future potential. You can visit the website and read transcripts at axrp.net.
Episodes
Mentioned books

28 snips
Aug 24, 2024 • 2h 17min
35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
In this discussion, Peter Hase, a researcher specializing in large language models, dives into the intriguing world of AI beliefs. He explores whether LLMs truly have beliefs and how to detect and edit them. A key focus is on the complexities of interpreting neural representations and the implications of belief localization. The conversation also covers the concept of easy-to-hard generalization, revealing insights on how AI tackles different task difficulties. Join Peter as he navigates these thought-provoking topics, blending philosophy with practical AI research.

43 snips
Jul 28, 2024 • 2h 14min
34 - AI Evaluations with Beth Barnes
Beth Barnes, the founder and head of research at METR, dives into the complexities of evaluating AI systems. They discuss tailored threat models and the unpredictability of AI performance, stressing the need for precise assessment methodologies. Barnes highlights issues like sandbagging and behavior misrepresentation, emphasizing the importance of ethical considerations in AI evaluations. The conversation also touches on the role of policy in shaping effective evaluation science, as well as the disparities between different AI labs in security and monitoring.

5 snips
Jun 12, 2024 • 1h 41min
33 - RLHF Problems with Scott Emmons
Expert Scott Emmons discusses challenges in Reinforcement Learning from Human Feedback (RLHF): deceptive inflation, overjustification, bounded human rationality, and solutions. Touches on dimensional analysis and his research program, emphasizing the importance of addressing these challenges in AI systems.

May 30, 2024 • 2h 22min
32 - Understanding Agency with Jan Kulveit
Jan Kulveit, who leads the Alignment of Complex Systems research group, dives into the fascinating intersection of AI and human cognition. He discusses active inference, the differences between large language models and the human brain, and how feedback loops influence behavior. The conversation explores hierarchical agency, the complexities of aligning AI with human values, and the philosophical implications of self-awareness in AI. Kulveit also critiques existing frameworks for understanding agency, shedding light on the dynamics of collective behaviors.

May 7, 2024 • 2h 32min
31 - Singular Learning Theory with Daniel Murfet
Daniel Murfet, a researcher specializing in singular learning theory and Bayesian statistics, dives into the intricacies of deep learning models. He explains how singular learning theory enhances our understanding of learning dynamics and phase transitions in neural networks. The conversation explores local learning coefficients, their impact on model accuracy, and how singular learning theory compares with other frameworks. Murfet also discusses the potential for this theory to contribute to AI alignment, emphasizing interpretability and the challenges of integrating AI capabilities with human values.

25 snips
Apr 30, 2024 • 2h 16min
30 - AI Security with Jeffrey Ladish
AI security expert Jeffrey Ladish discusses the robustness of safety training in AI models, dangers of open LLMs, securing against attackers, and the state of computer security. They explore undoing safety filters, AI phishing, and making AI more legible. Topics include securing model weights, defending against AI exfiltration, and red lines in AI development.

Apr 25, 2024 • 2h 14min
29 - Science of Deep Learning with Vikrant Varma
Vikrant Varma discusses challenges with unsupervised knowledge discovery, grokking in neural networks, circuit efficiency, and the role of complexity in deep learning. The conversation delves into the balance between memorization and generalization, exploring neural circuits, implicit priors, optimization, and alignment projects at DeepMind.

Apr 17, 2024 • 1h 58min
28 - Suing Labs for AI Risk with Gabriel Weil
Gabriel Weil discusses using tort law to hold AI companies accountable for disasters, comparing it to regulations and Pigouvian taxation. They talk about warning shots, legal changes, interactions with other laws, and the feasibility of liability reform. The conversation also touches on the technical research needed to support this proposal and the potential impact on decision-making in the AI field.

106 snips
Apr 11, 2024 • 2h 56min
27 - AI Control with Buck Shlegeris and Ryan Greenblatt
Buck Shlegeris and Ryan Greenblatt discuss AI control mechanisms in preventing AI from taking over the world. They cover topics such as protocols for AI control, preventing dangerous coded AI communication, unpredictably uncontrollable AI, and the impact of AI control on the AI safety field.

16 snips
Nov 26, 2023 • 1h 57min
26 - AI Governance with Elizabeth Seger
Elizabeth Seger, a researcher specializing in AI governance, discusses the importance of democratizing AI and the risks of open-sourcing powerful AI systems. They explore the offense-defense balance, the concept of AI governance, and alternative methods for open-sourcing AI models. They also highlight the role of technical alignment researchers in improving AI governance.