LessWrong (Curated & Popular) cover image

"AGI Ruin: A List of Lethalities" by Eliezer Yudkowsky

LessWrong (Curated & Popular)

00:00

The Best and Easiest Found by Optimization

Machine learning is like a genie where you just give it a wish, right? Expressed as some mysterious thing called a loss function, but which is basically just equivalent to an english wish phrasing. So why not train a giant stack of transformer layers on a darta set of agents doing nice things and not bad things? Right? Throw in the word corrigibility somewhere, crank up that computing power and get out an aligned a g i section b. Point one, the distributional leap ten. Running a g is doing something pivotal are not passively safe. They're the equivalent of nuclear caus that require actively maintained design properties to not go super critical and melt down. If you manage

Play episode from 18:43
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app