Artificial General Intelligence (AGI) Show with Soroush Pour cover image

Artificial General Intelligence (AGI) Show with Soroush Pour

Ep 11 - Technical alignment overview w/ Thomas Larsen (Director of Strategy, Center for AI Policy)

Dec 14, 2023
In this episode, Soroush Pour interviews Thomas Larsen, Director for Strategy at the Center for AI Policy. They discuss various topics including technical alignment areas such as scalable oversight, interpretability, heuristic arguments, model evaluations, agent foundations, and more. They also explore the concept of AIXI, uncomputability, building a multi-level world model, inverse reinforcement learning, and cooperative AI. The conversation concludes with a discussion on future challenges and cooperation in AI systems.
01:37:19

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • Scalable oversight can provide feedback signals to AI models as they become smarter than humans, using methods such as RLHF and debates.
  • Interpretability research aims to uncover the reasoning and thought processes of AI models, but faces challenges with spurious correlations and generalization.

Deep dives

Scalable Oversight

Scalable oversight aims to provide feedback signals for AI models, even as they become smarter than humans. Methods like reinforcement learning from human feedback (RLHF) and debates are used to oversee and align AI systems. The challenge lies in ensuring accurate human feedback when AI systems become more capable and in avoiding deceptive alignment. Control mechanisms are also implemented to prevent AI systems from self-exploration.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode