LessWrong (Curated & Popular) cover image

"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky

LessWrong (Curated & Popular)

00:00

The Doom of AI Training

Nate Netzwares: I picture Nate trying to think through in a more detailed mechanistic way than I can easily picture how a training process could lead an AI to the point of being able to do useful alignment research. As he does this, Nate feels like it keeps requiring a really intense level of CIS,. which then in turn, via the CIS leading the AI into situations that are highly exotic in some sense. Like most humans empirically don't invent enough nanotech to move the needle and most societies that are able to do that much radically new reasoning do undergo big cultural shifts relative to the surroundings. And the sort of courage ability you learn in training doesn't generalize how we

Play episode from 31:25
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app