"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

AI Deception, Interpretability, and Affordances with Apollo Research CEO Marius Hobbhahn

21 snips

Dec 15, 2023

Marius Hobbhahn, Founder and CEO of Apollo Research, dives deep into the important themes of AI deception and interpretability. He discusses how AI models can behave unethically under pressure and emphasizes the need for robust frameworks to ensure safety. The conversation explores the advancements in AI capabilities and the challenges of understanding AI behavior, particularly regarding deceptive alignments. Hobbhahn also advocates for collaborative governance as essential in navigating the complexities of auditing and regulatory standards in AI development.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Apollo's Mission

Apollo Research aims to reduce uncertainty surrounding AI risks through research, auditing, and governance.
They combine interpretability and behavioral evaluations for a holistic understanding of AI systems.

INSIGHT

Apollo's Origin

Apollo Research's origin stems from identifying gaps in evaluating deceptive alignment.
Their initial focus on research evolved to include real-world application and governance due to accelerating AI capabilities.

INSIGHT

Risk-Based Auditing

Apollo's framework emphasizes identifying risk creation points in AI systems.
They audit these points and define concepts like affordances, contextual capabilities, and reachable capabilities to discuss risks.

Get the Snipd Podcast app to discover more snips from this episode

Get the app