LessWrong (Curated & Popular) cover image

"AGI Ruin: A List of Lethalities" by Eliezer Yudkowsky

LessWrong (Curated & Popular)

00:00

How to Predict the Alignment Problems of Super Intelligence

Many alignment problems of intelligence will not naturally appear at pre dangerous, passively safe levels of capability. Problems that materialize at high intelligence and danger levels may fail to show up at safe lower levels of intelligence. Pivotal weak acts like this aren't known, and not for want of people looking for them. So again, you end up needing alignment to generalize out of the training distribution. You don't get a thousand failed tries at burning all gpus because people will notice,. Even leaving out the consequences of capability success and alignment failure.

Play episode from 24:15
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app