Theo Jaffee Podcast

#5: Quintin Pope - AI alignment, machine learning, failure modes, and reasons for optimism

Oct 1, 2023

02:36:28

forum

Ask episode

view_agenda

Chapters

auto_awesome

Transcript

info_circle

Episode notes

Quintin Pope is a machine learning researcher focusing on natural language modeling and AI alignment. Among alignment researchers, Quintin stands out for his optimism. He believes that AI alignment is far more tractable than it seems, and that we appear to be on a good path to making the future great. On LessWrong, he's written one of the most popular posts of the last year, “My Objections To ‘We're All Gonna Die with Eliezer Yudkowsky’”, as well as many other highly upvoted posts on various alignment papers, and on his own theory of alignment, shard theory.

Quintin’s Twitter: https://twitter.com/QuintinPope5
Quintin’s LessWrong profile: https://www.lesswrong.com/users/quintin-pope
My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”: https://www.lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky
The Shard Theory Sequence: https://www.lesswrong.com/s/nyEFg3AuJpdAozmoX
Quintin’s Alignment Papers Roundup: https://www.lesswrong.com/s/5omSW4wNKbEvYsyje
Evolution provides no evidence for the sharp left turn: https://www.lesswrong.com/posts/hvz9qjWyv8cLX9JJR/evolution-provides-no-evidence-for-the-sharp-left-turn
Deep Differentiable Logic Gate Networks: https://arxiv.org/abs/2210.08277
The Hydra Effect: Emergent Self-repair in Language Model Computations: https://arxiv.org/abs/2307.15771
Deep learning generalizes because the parameter-function map is biased towards simple functions: https://arxiv.org/abs/1805.08522
Bridging RL Theory and Practice with the Effective Horizon: https://arxiv.org/abs/2304.09853

PODCAST LINKS:

Video Transcript: https://www.theojaffee.com/p/5-quintin-pope
Spotify: https://open.spotify.com/show/1IJRtB8FP4Cnq8lWuuCdvW?si=eba62a72e6234efb
Apple Podcasts: https://podcasts.apple.com/us/podcast/theo-jaffee-podcast/id1699912677
RSS: https://api.substack.com/feed/podcast/989123/s/75569/private/129f6344-c459-4581-a9da-dc331677c2f6.rss
Playlist of all episodes: https://www.youtube.com/playlist?list=PLVN8-zhbMh9YnOGVRT9m0xzqTNGD_sujj
My Twitter: https://x.com/theojaffee
My Substack: https://www.theojaffee.com

CHAPTERS:

Introduction (0:00)

What Is AGI? (1:03)

What Can AGI Do? (12:49)

Orthogonality (23:14)

Mind Space (42:50)

Quintin’s Background and Optimism (55:06)

Mesa-Optimization and Reward Hacking (1:02:48)

Deceptive Alignment (1:11:52)

Shard Theory (1:24:10)

What Is Alignment? (1:30:05)

Misalignment and Evolution (1:37:21)

Mesa-Optimization and Reward Hacking, Part 2 (1:46:56)

RL Agents (1:55:02)

Monitoring AIs (2:09:29)

Mechanistic Interpretability (2:14:00)

AI Disempowering Humanity (2:28:13)

Home Top podcasts Popular guests