AXRP - the AI X-risk Research Podcast cover image

AXRP - the AI X-risk Research Podcast

Latest episodes

undefined
43 snips
Jul 28, 2024 • 2h 14min

34 - AI Evaluations with Beth Barnes

Beth Barnes, the founder and head of research at METR, dives into the complexities of evaluating AI systems. They discuss tailored threat models and the unpredictability of AI performance, stressing the need for precise assessment methodologies. Barnes highlights issues like sandbagging and behavior misrepresentation, emphasizing the importance of ethical considerations in AI evaluations. The conversation also touches on the role of policy in shaping effective evaluation science, as well as the disparities between different AI labs in security and monitoring.
undefined
5 snips
Jun 12, 2024 • 1h 41min

33 - RLHF Problems with Scott Emmons

Expert Scott Emmons discusses challenges in Reinforcement Learning from Human Feedback (RLHF): deceptive inflation, overjustification, bounded human rationality, and solutions. Touches on dimensional analysis and his research program, emphasizing the importance of addressing these challenges in AI systems.
undefined
May 30, 2024 • 2h 22min

32 - Understanding Agency with Jan Kulveit

Jan Kulveit, who leads the Alignment of Complex Systems research group, dives into the fascinating intersection of AI and human cognition. He discusses active inference, the differences between large language models and the human brain, and how feedback loops influence behavior. The conversation explores hierarchical agency, the complexities of aligning AI with human values, and the philosophical implications of self-awareness in AI. Kulveit also critiques existing frameworks for understanding agency, shedding light on the dynamics of collective behaviors.
undefined
May 7, 2024 • 2h 32min

31 - Singular Learning Theory with Daniel Murfet

Daniel Murfet, a researcher specializing in singular learning theory and Bayesian statistics, dives into the intricacies of deep learning models. He explains how singular learning theory enhances our understanding of learning dynamics and phase transitions in neural networks. The conversation explores local learning coefficients, their impact on model accuracy, and how singular learning theory compares with other frameworks. Murfet also discusses the potential for this theory to contribute to AI alignment, emphasizing interpretability and the challenges of integrating AI capabilities with human values.
undefined
25 snips
Apr 30, 2024 • 2h 16min

30 - AI Security with Jeffrey Ladish

AI security expert Jeffrey Ladish discusses the robustness of safety training in AI models, dangers of open LLMs, securing against attackers, and the state of computer security. They explore undoing safety filters, AI phishing, and making AI more legible. Topics include securing model weights, defending against AI exfiltration, and red lines in AI development.
undefined
Apr 25, 2024 • 2h 14min

29 - Science of Deep Learning with Vikrant Varma

Vikrant Varma discusses challenges with unsupervised knowledge discovery, grokking in neural networks, circuit efficiency, and the role of complexity in deep learning. The conversation delves into the balance between memorization and generalization, exploring neural circuits, implicit priors, optimization, and alignment projects at DeepMind.
undefined
Apr 17, 2024 • 1h 58min

28 - Suing Labs for AI Risk with Gabriel Weil

Gabriel Weil discusses using tort law to hold AI companies accountable for disasters, comparing it to regulations and Pigouvian taxation. They talk about warning shots, legal changes, interactions with other laws, and the feasibility of liability reform. The conversation also touches on the technical research needed to support this proposal and the potential impact on decision-making in the AI field.
undefined
106 snips
Apr 11, 2024 • 2h 56min

27 - AI Control with Buck Shlegeris and Ryan Greenblatt

Buck Shlegeris and Ryan Greenblatt discuss AI control mechanisms in preventing AI from taking over the world. They cover topics such as protocols for AI control, preventing dangerous coded AI communication, unpredictably uncontrollable AI, and the impact of AI control on the AI safety field.
undefined
16 snips
Nov 26, 2023 • 1h 57min

26 - AI Governance with Elizabeth Seger

Elizabeth Seger, a researcher specializing in AI governance, discusses the importance of democratizing AI and the risks of open-sourcing powerful AI systems. They explore the offense-defense balance, the concept of AI governance, and alternative methods for open-sourcing AI models. They also highlight the role of technical alignment researchers in improving AI governance.
undefined
Oct 3, 2023 • 3h 2min

25 - Cooperative AI with Caspar Oesterheld

Caspar Oesterheld discusses cooperative AI, its applications, and interactions between AI systems. They explore AI arms races, game theory limitations, and the challenges of aligning AI with human values. The podcast also covers regret minimization in decision-making, multi-armed bandit problem, logical induction, safe Pareto improvements, and similarity-based cooperation. They highlight the importance of communication, enforcement mechanisms, and the complexities of achieving effective cooperation and alignment in AI systems.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode