The Inside View

Michaël Trazzi

The goal of this podcast is to create a place where people discuss their inside views about existential risk from AI.

Episodes

Mentioned books

Aug 23, 2024 • 2h 16min

Owain Evans - AI Situational Awareness, Out-of-Context Reasoning

Owain Evans, an AI Alignment researcher at UC Berkeley’s Center for Human Compatible AI, dives deep into the intricacies of AI situational awareness. He discusses his recent papers addressing the creation of a dataset for large language models and their surprising capabilities in out-of-context reasoning. The conversation explores safety implications, deceptive alignment in AI, and the benchmark for evaluating LLM performance. Evans emphasizes the need for vigilant monitoring in AI training, touching on the challenges and future of model evaluations.

May 17, 2024 • 2h 16min

[Crosspost] Adam Gleave on Vulnerabilities in GPT-4 APIs (+ extra Nathan Labenz interview)

Adam Gleave from Far AI and Nathan Labenz discuss vulnerabilities in GPT-4's APIs, accidental jailbreaking during fine-tuning, malicious code generation, private email discovery risks, ethical AI disclosure dilemmas, and navigating the ethical landscape of open source models. They explore exploiting vulnerabilities in superhuman Go AIs, challenges with GPT-4, and the transformative potential of AI.

Apr 9, 2024 • 37min

Ethan Perez on Selecting Alignment Research Projects (ft. Mikita Balesni & Henry Sleight)

Mikita Balesni and Henry Sleight interview Ethan Perez on AI Alignment research projects, discussing problem-driven vs results-driven approaches, balancing intuition with empirical evidence, and the significance of addressing safety issues in AI. They also explore the importance of mentorship for young researchers, altering project trajectories based on feedback, and navigating project switches for promising results.

Feb 20, 2024 • 1h 43min

Emil Wallner on Sora, Generative AI Startups and AI optimism

Emil Wallner discusses Sora, generative AI startups, and AI optimism. Topics include colorizing B&W pictures, Sora's capabilities, challenges, OpenAI's monopoly, hardware costs, diverse reactions to Sora, recursive self-improvement, and the future of AI models.

Feb 12, 2024 • 52min

Evan Hubinger on Sleeper Agents, Deception and Responsible Scaling Policies

In this podcast, Evan Hubinger discusses the Sleeper Agents paper and its implications. He explores threat models of deceptive behavior and the challenges of removing it through safety training. The podcast also covers the concept of chain of thought in models, detecting deployment, and complex triggers. Additionally, it delves into deceptive instrumental alignment threat models and the role of alignment stress testing in AI safety.

Jan 27, 2024 • 33min

[Jan 2023] Jeffrey Ladish on AI Augmented Cyberwarfare and compute monitoring

Expert in AI augmented cyberwarfare and compute monitoring, Jeffrey Ladish, discusses the potential for automating cyberwarfare, advantages of AI in cyber attacks, current state and dangers of AI technology, current generation systems, limitations, and covert system penetration, as well as AI scaling and compute monitoring.

Jan 22, 2024 • 1h 40min

Holly Elmore on pausing AI

Holly Elmore, an AI Pause Advocate discusses protests against AI advancements, motivations for pausing AGI, debate on AI pause in 2022, regulations, global warming vs. AI risk, China's pace, and advocating for a pause in AI development. The podcast explores navigating media attention, grassroots activism, risk tolerance, influences on public perception, ethical considerations, and algorithmic governance.

Jan 9, 2024 • 1h 4min

Podcast Retrospective and Next Steps

Dive into the evolution of a podcast focused on superintelligence and AI safety. Discover the challenges of finding engaging guests and how content styles have shifted over time. Explore the impact of video interviews on the AI research community, as creators balance audience feedback with the need for compelling content. The discussion reveals the dynamic debates within the AI risk community during a transformative period in AI discourse.

Sep 29, 2023 • 5min

Paul Christiano's views on "doom" (ft. Robert Miles)

Dive into a thought-provoking discussion on the future of humanity amidst advanced AI. The conversation navigates between three potential outcomes: a hopeful flourishing, a grim extinction, and a survival struggle. Emphasis is placed on the urgency of creating a decision-making framework to assess these scenarios. It's a captivating exploration of the optimism, risks, and the critical need for proactive measures.

Sep 21, 2023 • 2h 5min

Neel Nanda on mechanistic interpretability, superposition and grokking

Neel Nanda, a researcher at Google DeepMind, discusses mechanistic interpretability in AI, induction heads in models, and his journey into alignment. He explores scalable oversight, the ambitious degree of interpretability in transformer architectures, and the capability of humans to understand complex models. The podcast also covers linear representations in neural networks, the concept of superposition in models and features, Terry Matt's mentorship program, and the importance of interpretability in AI systems.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner