Artificial General Intelligence (AGI) Show with Soroush Pour cover image

Artificial General Intelligence (AGI) Show with Soroush Pour

Ep 11 - Technical alignment overview w/ Thomas Larsen (Director of Strategy, Center for AI Policy)

Dec 14, 2023
In this episode, Soroush Pour interviews Thomas Larsen, Director for Strategy at the Center for AI Policy. They discuss various topics including technical alignment areas such as scalable oversight, interpretability, heuristic arguments, model evaluations, agent foundations, and more. They also explore the concept of AIXI, uncomputability, building a multi-level world model, inverse reinforcement learning, and cooperative AI. The conversation concludes with a discussion on future challenges and cooperation in AI systems.
01:37:19

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • Scalable oversight can provide feedback signals to AI models as they become smarter than humans, using methods such as RLHF and debates.
  • Interpretability research aims to uncover the reasoning and thought processes of AI models, but faces challenges with spurious correlations and generalization.

Deep dives

Scalable Oversight

Scalable oversight aims to provide feedback signals for AI models, even as they become smarter than humans. Methods like reinforcement learning from human feedback (RLHF) and debates are used to oversee and align AI systems. The challenge lies in ensuring accurate human feedback when AI systems become more capable and in avoiding deceptive alignment. Control mechanisms are also implemented to prevent AI systems from self-exploration.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner