How to Align AI With SGD

There is probably no physically implemented reward function of the kind that could be optimized with SGD. I'm most optimistic about approaches where RL is only performed on a reward function that gets smarter in parallel with the agent being trained. No current plans for aligning AI have a particularly high probability of working without a lot of iteration and modification. Even if alignment ends up easy, you would be likely to end up predicting the wrong way for it to be easy.

Play episode from 05:59

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

"Where I agree and disagree with Eliezer" by Paul Christiano

LessWrong (Curated & Popular)

How to Align AI With SGD

Agreements

The AI-powered Podcast Player