LessWrong (Curated & Popular) cover image

"AGI Ruin: A List of Lethalities" by Eliezer Yudkowsky

LessWrong (Curated & Popular)

00:00

How to Find a Simple Corrigible Alignment

There is no analogous truth about there being a simple cors of alignment, especially not one that is even easier for gradient descent to find than it would have been for natural selection to just find. Many anti corrigible lines of reasoning like this may only first appear at high levels of intelligence. Within a corps of general intelligence, the capability that generalizes far out its original distribution. We've got no idea what's actually going on inside the giant inscrutable matrises.

Play episode from 38:47
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app