LessWrong (Curated & Popular) cover image

"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky

LessWrong (Curated & Popular)

00:00

How AI's Can Help You Achieve Your Goals

Many AI's are developed based on predicting all kinds of situations where CISC stuff worked, not just predicting Alice. But once the AI is out of training and trying to push the research frontier, its many AI's will continue to cause it to do reflection in quotes. Lacking these sorts of CISC mini-AIs, the AI has no way of doing truly Alice-like next steps at difficult junctures in alignment research. So it makes moves that are kind of locally or superficially Alice-like, but don't actually match what she would do in tough situations. There could imaginably be junctures where the AI needs to be good at poodle avoidance.

Play episode from 18:51
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app