LessWrong (Curated & Popular) cover image

"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky

LessWrong (Curated & Popular)

00:00

The Implications of Holden's Training Setup

Puda avoidance patterns might also be inherently more brittle in quotes than CISC patterns, since the latter correspond to deep facts about how to accomplish approximately whatever. So unless we find a way of resolving the central tension, you have to aim at CIS to do helpful things and reliable Puda avoidance is much harder or longer to learn than aiming at CIS. If this procedure results in a dangerous AI, it's not clear.

Play episode from 24:48
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app