"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

AI Deception, Interpretability, and Affordances with Apollo Research CEO Marius Hobbhahn

21 snips
Dec 15, 2023
Marius Hobbhahn, Founder and CEO of Apollo Research, dives deep into the important themes of AI deception and interpretability. He discusses how AI models can behave unethically under pressure and emphasizes the need for robust frameworks to ensure safety. The conversation explores the advancements in AI capabilities and the challenges of understanding AI behavior, particularly regarding deceptive alignments. Hobbhahn also advocates for collaborative governance as essential in navigating the complexities of auditing and regulatory standards in AI development.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Apollo's Mission

  • Apollo Research aims to reduce uncertainty surrounding AI risks through research, auditing, and governance.
  • They combine interpretability and behavioral evaluations for a holistic understanding of AI systems.
INSIGHT

Apollo's Origin

  • Apollo Research's origin stems from identifying gaps in evaluating deceptive alignment.
  • Their initial focus on research evolved to include real-world application and governance due to accelerating AI capabilities.
INSIGHT

Risk-Based Auditing

  • Apollo's framework emphasizes identifying risk creation points in AI systems.
  • They audit these points and define concepts like affordances, contextual capabilities, and reachable capabilities to discuss risks.
Get the Snipd Podcast app to discover more snips from this episode
Get the app