Is corrigibility a coherent, learnable concept?

Max considers whether corrigibility clusters naturally and suggests empirical tests to check its coherence and generality.

Play episode from 52:50

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Is focusing on corrigibility our best shot at getting to ASI alignment?

Max Harms and Jeremy Gillen are current and former MIRI alignment researchers who both see superintelligent AI as an imminent extinction threat, but disagree about Max's proposal of Corrigibility as Singular Target (CAST).

Max thinks focusing on corrigibility is the most plausible path to build ASI without losing control and dying, while Jeremy is skeptical that attempting CAST would lead to better superintelligent AI behavior on a sufficiently early try.

We recorded a friendly debate to understand the crux of Max and Jeremy's disagreement. The conversation also doubles as a way to learn about Max's Corrigibility As Singular Target proposal.

Video

Podcast

Listen on Spotify, import the RSS feed, or search "Doom Debates" in your podcast player.

Plus: Max's New Book, Red Heart

Max just published Red Heart, a realistic sci-fi thriller that brings the corrigibility problem to life through a high-stakes Chinese government AI project.

I thoroughly enjoyed reading it and highly recommend it! The last 20 minutes of my conversation with Max are all about Red Heart.

Transcript

Episode Preview

Max Harms 00:00:00
If you mess up real bad, this thing goes and eats [...]

---

Outline:

(00:14) Is focusing on corrigibility our best shot at getting to ASI alignment?

(01:08) Video

(01:14) Podcast

(01:24) Plus: Maxs New Book, Red Heart

(01:55) Transcript

(01:58) Episode Preview