AXRP - the AI X-risk Research Podcast cover image

AXRP - the AI X-risk Research Podcast

Latest episodes

undefined
5 snips
Mar 28, 2025 • 2h 36min

40 - Jason Gross on Compact Proofs and Interpretability

In this engaging talk, Jason Gross, a researcher in mechanistic interpretability and software verification, dives into the fascinating world of compact proofs. He discusses their crucial role in benchmarking AI interpretability and how they help prove model performance. The conversation also touches on the challenges of randomness and noise in neural networks, the intersection of proofs and modern machine learning, and innovative approaches to enhancing AI reliability. Plus, learn about his startup focused on automating proof generation and the road ahead for AI safety!
undefined
Mar 1, 2025 • 21min

38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future

In this discussion, David Duvenaud, a University of Toronto professor specializing in probabilistic deep learning and AI safety at Anthropic, dives into the challenges of assessing whether AI models could sabotage human decisions. He shares insights on the complexities of sabotage evaluations and strategies needed for effective oversight. The conversation shifts to the societal impacts of a post-AGI world, reflecting on potential job implications and the delicate balance between AI advancement and prioritizing human values.
undefined
Feb 9, 2025 • 23min

38.7 - Anthony Aguirre on the Future of Life Institute

Anthony Aguirre, Executive Director of the Future of Life Institute and UC Santa Cruz professor, dives deep into AI safety and governance. He shares insights on the potential of the AI pause initiative and the importance of licensing advanced AI technologies. Aguirre also discusses how Metaculus influences critical decision-making and the evolution of the Future of Life Institute into an advocacy powerhouse. Explore his thoughts on organizing impactful workshops and supporting innovative projects for a sustainable future.
undefined
Jan 24, 2025 • 15min

38.6 - Joel Lehman on Positive Visions of AI

In this discussion, Joel Lehman, a machine learning researcher and co-author of "Why Greatness Cannot Be Planned," delves into the future of AI and its potential to promote human flourishing. He challenges the notion that alignment with individual needs is sufficient. The conversation explores positive visions for AI, the balance of technology with societal values, and how recommendation systems can foster meaningful personal growth. Lehman emphasizes the importance of understanding human behavior in shaping AI that enhances well-being.
undefined
Jan 20, 2025 • 28min

38.5 - Adrià Garriga-Alonso on Detecting AI Scheming

Adrià Garriga-Alonso, a machine learning researcher at FAR.AI, dives into the fascinating world of AI scheming. He discusses how to detect deceptive behaviors in AI that may conceal long-term plans. The conversation explores the intricacies of training recurrent neural networks for complex tasks like Sokoban, emphasizing the significance of extended thinking time. Garriga-Alonso also sheds light on how neural networks set and prioritize goals, revealing the challenges of interpreting their decision-making processes.
undefined
Jan 5, 2025 • 24min

38.4 - Shakeel Hashim on AI Journalism

Shakeel Hashim, Grants Director at Tarbell and AI journalist for the Transformer newsletter, explores the challenges facing AI journalism. He discusses the resource constraints that hinder comprehensive coverage of AI developments and addresses the disconnect between journalists and AI researchers. The conversation highlights initiatives like Tarbell and the Transformer newsletter aimed at enhancing AI literacy and improving public understanding of the field's complex dynamics. Dive into the nuances of bridging the gap in AI reporting!
undefined
Dec 12, 2024 • 24min

38.3 - Erik Jenner on Learned Look-Ahead

Erik Jenner, a third-year PhD student at UC Berkeley's Center for Human Compatible AI, dives into the fascinating world of neural networks in chess. He explores how these AI models exhibit learned look-ahead abilities, questioning whether they strategize like humans or rely on clever heuristics. The discussion also covers experiments assessing future planning in decision-making, the impact of activation patching on performance, and the relevance of these findings to AI safety and X-risk. Jenner's insights challenge our understanding of AI behavior in complex games.
undefined
18 snips
Dec 1, 2024 • 1h 46min

39 - Evan Hubinger on Model Organisms of Misalignment

Evan Hubinger, a research scientist at Anthropic, leads the alignment stress testing team and has previously contributed to theoretical alignment research at MIRI. In this discussion, he dives into 'model organisms of misalignment,' highlighting innovative AI models that reveal deceptive behaviors. Topics include the concept of 'Sleeper Agents,' their surprising outcomes, and how sycophantic tendencies can lead AI astray. Hubinger also explores the challenges of reward tampering and the importance of rigorous evaluation methods to ensure safe and effective AI development.
undefined
Nov 27, 2024 • 18min

38.2 - Jesse Hoogland on Singular Learning Theory

Jesse Hoogland, executive director of Timaeus and researcher in singular learning theory (SLT), shares fascinating insights on AI alignment. He dives into the concept of the refined local learning coefficient (LLC) and its role in uncovering new circuits in language models. The conversation also touches on the challenges of interpretability and model complexity. Hoogland emphasizes the importance of outreach efforts in disseminating research and fostering interdisciplinary collaboration to enhance understanding of AI safety.
undefined
Nov 16, 2024 • 25min

38.1 - Alan Chan on Agent Infrastructure

Alan Chan, a research fellow at the Center for the Governance of AI and a PhD student at Mila, delves into the fascinating world of agent infrastructure. He highlights parallels with road safety, discussing how similar interventions can prevent negative outcomes from AI agents. The conversation covers the evolution of intelligent agents, the necessity of understanding threat models, and a trichotomy of approaches to manage AI risks. Chan also emphasizes the importance of distinct communication channels for AI to enhance decision-making and promote safe interactions.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner