
"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky
LessWrong (Curated & Popular)
00:00
Holden's Hypothetical AI Training Approach
In a world where this hypothetical approach was very unlikely, 10% or less, to result in safe, powerful AI, Holden would think something more like, we're not just going to get lucky. We're going for pure imitation in some sense, although we need it to work out of distribution. In the sense that the AI needs to be able to continue doing what its immatatee would do, even when faced with research questions and insight unlike those seen in training. Given these assumptions, the question is, would such a model be dangerous? That is, would it both A, aim at CIS and B, not reliably avoid puda?"
Play episode from 06:49
Transcript


