
"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky
LessWrong (Curated & Popular)
00:00
The Doom of AI Training
Nate Netzwares: I picture Nate trying to think through in a more detailed mechanistic way than I can easily picture how a training process could lead an AI to the point of being able to do useful alignment research. As he does this, Nate feels like it keeps requiring a really intense level of CIS,. which then in turn, via the CIS leading the AI into situations that are highly exotic in some sense. Like most humans empirically don't invent enough nanotech to move the needle and most societies that are able to do that much radically new reasoning do undergo big cultural shifts relative to the surroundings. And the sort of courage ability you learn in training doesn't generalize how we
Play episode from 31:25
Transcript


