
"Where I agree and disagree with Eliezer" by Paul Christiano
LessWrong (Curated & Popular)
00:00
How to Align AI With SGD
There is probably no physically implemented reward function of the kind that could be optimized with SGD. I'm most optimistic about approaches where RL is only performed on a reward function that gets smarter in parallel with the agent being trained. No current plans for aligning AI have a particularly high probability of working without a lot of iteration and modification. Even if alignment ends up easy, you would be likely to end up predicting the wrong way for it to be easy.
Play episode from 05:59
Transcript


