AXRP - the AI X-risk Research Podcast cover image

AXRP - the AI X-risk Research Podcast

Latest episodes

undefined
Feb 9, 2025 • 23min

38.7 - Anthony Aguirre on the Future of Life Institute

Anthony Aguirre, Executive Director of the Future of Life Institute and UC Santa Cruz professor, dives deep into AI safety and governance. He shares insights on the potential of the AI pause initiative and the importance of licensing advanced AI technologies. Aguirre also discusses how Metaculus influences critical decision-making and the evolution of the Future of Life Institute into an advocacy powerhouse. Explore his thoughts on organizing impactful workshops and supporting innovative projects for a sustainable future.
undefined
Jan 24, 2025 • 15min

38.6 - Joel Lehman on Positive Visions of AI

In this discussion, Joel Lehman, a machine learning researcher and co-author of "Why Greatness Cannot Be Planned," delves into the future of AI and its potential to promote human flourishing. He challenges the notion that alignment with individual needs is sufficient. The conversation explores positive visions for AI, the balance of technology with societal values, and how recommendation systems can foster meaningful personal growth. Lehman emphasizes the importance of understanding human behavior in shaping AI that enhances well-being.
undefined
Jan 20, 2025 • 28min

38.5 - Adrià Garriga-Alonso on Detecting AI Scheming

Adrià Garriga-Alonso, a machine learning researcher at FAR.AI, dives into the fascinating world of AI scheming. He discusses how to detect deceptive behaviors in AI that may conceal long-term plans. The conversation explores the intricacies of training recurrent neural networks for complex tasks like Sokoban, emphasizing the significance of extended thinking time. Garriga-Alonso also sheds light on how neural networks set and prioritize goals, revealing the challenges of interpreting their decision-making processes.
undefined
Jan 5, 2025 • 24min

38.4 - Shakeel Hashim on AI Journalism

Shakeel Hashim, Grants Director at Tarbell and AI journalist for the Transformer newsletter, explores the challenges facing AI journalism. He discusses the resource constraints that hinder comprehensive coverage of AI developments and addresses the disconnect between journalists and AI researchers. The conversation highlights initiatives like Tarbell and the Transformer newsletter aimed at enhancing AI literacy and improving public understanding of the field's complex dynamics. Dive into the nuances of bridging the gap in AI reporting!
undefined
Dec 12, 2024 • 24min

38.3 - Erik Jenner on Learned Look-Ahead

Erik Jenner, a third-year PhD student at UC Berkeley's Center for Human Compatible AI, dives into the fascinating world of neural networks in chess. He explores how these AI models exhibit learned look-ahead abilities, questioning whether they strategize like humans or rely on clever heuristics. The discussion also covers experiments assessing future planning in decision-making, the impact of activation patching on performance, and the relevance of these findings to AI safety and X-risk. Jenner's insights challenge our understanding of AI behavior in complex games.
undefined
18 snips
Dec 1, 2024 • 1h 46min

39 - Evan Hubinger on Model Organisms of Misalignment

Evan Hubinger, a research scientist at Anthropic, leads the alignment stress testing team and has previously contributed to theoretical alignment research at MIRI. In this discussion, he dives into 'model organisms of misalignment,' highlighting innovative AI models that reveal deceptive behaviors. Topics include the concept of 'Sleeper Agents,' their surprising outcomes, and how sycophantic tendencies can lead AI astray. Hubinger also explores the challenges of reward tampering and the importance of rigorous evaluation methods to ensure safe and effective AI development.
undefined
Nov 27, 2024 • 18min

38.2 - Jesse Hoogland on Singular Learning Theory

Jesse Hoogland, executive director of Timaeus and researcher in singular learning theory (SLT), shares fascinating insights on AI alignment. He dives into the concept of the refined local learning coefficient (LLC) and its role in uncovering new circuits in language models. The conversation also touches on the challenges of interpretability and model complexity. Hoogland emphasizes the importance of outreach efforts in disseminating research and fostering interdisciplinary collaboration to enhance understanding of AI safety.
undefined
Nov 16, 2024 • 25min

38.1 - Alan Chan on Agent Infrastructure

Alan Chan, a research fellow at the Center for the Governance of AI and a PhD student at Mila, delves into the fascinating world of agent infrastructure. He highlights parallels with road safety, discussing how similar interventions can prevent negative outcomes from AI agents. The conversation covers the evolution of intelligent agents, the necessity of understanding threat models, and a trichotomy of approaches to manage AI risks. Chan also emphasizes the importance of distinct communication channels for AI to enhance decision-making and promote safe interactions.
undefined
4 snips
Nov 14, 2024 • 23min

38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems

Zhijing Jin, an Assistant Professor at the University of Toronto, specializes in the intersection of natural language processing and causal inference. In this engaging discussion, she investigates whether language models truly understand causality or just recognize correlations. Zhijing explores the limitations of these models in reasoning, their application in multi-agent systems, and the complexities of digital societies. She poses intriguing questions about AI governance and cooperation, emphasizing the delicate balance required for sustainable agent interactions.
undefined
14 snips
Oct 4, 2024 • 1h 44min

37 - Jaime Sevilla on AI Forecasting

Jaime Sevilla, Director of Epoch AI, dives into the intricacies of AI forecasting and compute trends. He discusses the exponential growth in computational power and its implications for AI development. The conversation highlights the tight relationship between algorithmic improvements and scaling, considering whether scaling is the key to achieving AGI. Sevilla also tackles challenges in GPU production and the importance of transparent AI training processes. Get ready for some thought-provoking insights into the future of artificial intelligence!

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode