LessWrong (Curated & Popular) cover image

"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky

LessWrong (Curated & Popular)

00:00

Holden's Hypothetical AI Training Approach

In a world where this hypothetical approach was very unlikely, 10% or less, to result in safe, powerful AI, Holden would think something more like, we're not just going to get lucky. We're going for pure imitation in some sense, although we need it to work out of distribution. In the sense that the AI needs to be able to continue doing what its immatatee would do, even when faced with research questions and insight unlike those seen in training. Given these assumptions, the question is, would such a model be dangerous? That is, would it both A, aim at CIS and B, not reliably avoid puda?"

Play episode from 06:49
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app