Theo Jaffee Podcast cover image

Theo Jaffee Podcast

#5: Quintin Pope - AI alignment, machine learning, failure modes, and reasons for optimism

Oct 1, 2023
02:36:28

Quintin Pope is a machine learning researcher focusing on natural language modeling and AI alignment. Among alignment researchers, Quintin stands out for his optimism. He believes that AI alignment is far more tractable than it seems, and that we appear to be on a good path to making the future great. On LessWrong, he's written one of the most popular posts of the last year, “My Objections To ‘We're All Gonna Die with Eliezer Yudkowsky’”, as well as many other highly upvoted posts on various alignment papers, and on his own theory of alignment, shard theory.

PODCAST LINKS:

CHAPTERS:

Introduction (0:00)

What Is AGI? (1:03)

What Can AGI Do? (12:49)

Orthogonality (23:14)

Mind Space (42:50)

Quintin’s Background and Optimism (55:06)

Mesa-Optimization and Reward Hacking (1:02:48)

Deceptive Alignment (1:11:52)

Shard Theory (1:24:10)

What Is Alignment? (1:30:05)

Misalignment and Evolution (1:37:21)

Mesa-Optimization and Reward Hacking, Part 2 (1:46:56)

RL Agents (1:55:02)

Monitoring AIs (2:09:29)

Mechanistic Interpretability (2:14:00)

AI Disempowering Humanity (2:28:13)

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode