
"Discussion with Nate Soares on a key alignment difficulty" by Holden Karnofsky
LessWrong (Curated & Popular)
00:00
How AI's Can Help You Achieve Your Goals
Many AI's are developed based on predicting all kinds of situations where CISC stuff worked, not just predicting Alice. But once the AI is out of training and trying to push the research frontier, its many AI's will continue to cause it to do reflection in quotes. Lacking these sorts of CISC mini-AIs, the AI has no way of doing truly Alice-like next steps at difficult junctures in alignment research. So it makes moves that are kind of locally or superficially Alice-like, but don't actually match what she would do in tough situations. There could imaginably be junctures where the AI needs to be good at poodle avoidance.
Play episode from 18:51
Transcript


