AXRP - the AI X-risk Research Podcast

Daniel Filan
undefined
28 snips
Aug 24, 2024 • 2h 17min

35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization

In this discussion, Peter Hase, a researcher specializing in large language models, dives into the intriguing world of AI beliefs. He explores whether LLMs truly have beliefs and how to detect and edit them. A key focus is on the complexities of interpreting neural representations and the implications of belief localization. The conversation also covers the concept of easy-to-hard generalization, revealing insights on how AI tackles different task difficulties. Join Peter as he navigates these thought-provoking topics, blending philosophy with practical AI research.
undefined
43 snips
Jul 28, 2024 • 2h 14min

34 - AI Evaluations with Beth Barnes

Beth Barnes, the founder and head of research at METR, dives into the complexities of evaluating AI systems. They discuss tailored threat models and the unpredictability of AI performance, stressing the need for precise assessment methodologies. Barnes highlights issues like sandbagging and behavior misrepresentation, emphasizing the importance of ethical considerations in AI evaluations. The conversation also touches on the role of policy in shaping effective evaluation science, as well as the disparities between different AI labs in security and monitoring.
undefined
5 snips
Jun 12, 2024 • 1h 41min

33 - RLHF Problems with Scott Emmons

Expert Scott Emmons discusses challenges in Reinforcement Learning from Human Feedback (RLHF): deceptive inflation, overjustification, bounded human rationality, and solutions. Touches on dimensional analysis and his research program, emphasizing the importance of addressing these challenges in AI systems.
undefined
May 30, 2024 • 2h 22min

32 - Understanding Agency with Jan Kulveit

Jan Kulveit, who leads the Alignment of Complex Systems research group, dives into the fascinating intersection of AI and human cognition. He discusses active inference, the differences between large language models and the human brain, and how feedback loops influence behavior. The conversation explores hierarchical agency, the complexities of aligning AI with human values, and the philosophical implications of self-awareness in AI. Kulveit also critiques existing frameworks for understanding agency, shedding light on the dynamics of collective behaviors.
undefined
May 7, 2024 • 2h 32min

31 - Singular Learning Theory with Daniel Murfet

Daniel Murfet, a researcher specializing in singular learning theory and Bayesian statistics, dives into the intricacies of deep learning models. He explains how singular learning theory enhances our understanding of learning dynamics and phase transitions in neural networks. The conversation explores local learning coefficients, their impact on model accuracy, and how singular learning theory compares with other frameworks. Murfet also discusses the potential for this theory to contribute to AI alignment, emphasizing interpretability and the challenges of integrating AI capabilities with human values.
undefined
25 snips
Apr 30, 2024 • 2h 16min

30 - AI Security with Jeffrey Ladish

AI security expert Jeffrey Ladish discusses the robustness of safety training in AI models, dangers of open LLMs, securing against attackers, and the state of computer security. They explore undoing safety filters, AI phishing, and making AI more legible. Topics include securing model weights, defending against AI exfiltration, and red lines in AI development.
undefined
Apr 25, 2024 • 2h 14min

29 - Science of Deep Learning with Vikrant Varma

Vikrant Varma discusses challenges with unsupervised knowledge discovery, grokking in neural networks, circuit efficiency, and the role of complexity in deep learning. The conversation delves into the balance between memorization and generalization, exploring neural circuits, implicit priors, optimization, and alignment projects at DeepMind.
undefined
Apr 17, 2024 • 1h 58min

28 - Suing Labs for AI Risk with Gabriel Weil

Gabriel Weil discusses using tort law to hold AI companies accountable for disasters, comparing it to regulations and Pigouvian taxation. They talk about warning shots, legal changes, interactions with other laws, and the feasibility of liability reform. The conversation also touches on the technical research needed to support this proposal and the potential impact on decision-making in the AI field.
undefined
106 snips
Apr 11, 2024 • 2h 56min

27 - AI Control with Buck Shlegeris and Ryan Greenblatt

Buck Shlegeris and Ryan Greenblatt discuss AI control mechanisms in preventing AI from taking over the world. They cover topics such as protocols for AI control, preventing dangerous coded AI communication, unpredictably uncontrollable AI, and the impact of AI control on the AI safety field.
undefined
16 snips
Nov 26, 2023 • 1h 57min

26 - AI Governance with Elizabeth Seger

Elizabeth Seger, a researcher specializing in AI governance, discusses the importance of democratizing AI and the risks of open-sourcing powerful AI systems. They explore the offense-defense balance, the concept of AI governance, and alternative methods for open-sourcing AI models. They also highlight the role of technical alignment researchers in improving AI governance.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app