
"Where I agree and disagree with Eliezer" by Paul Christiano
LessWrong (Curated & Popular)
00:00
Elieza's Arguments on Honesty
Elieza raises many good considerations backed by pretty clear arguments, but makes confident assertions that are much stronger than anything suggested by actual argument. It's very hard to even set up games for which no strategy can outperform honesty. And if we don't build an AI that understands our preferences about this kind of subtle bad behaviour, then a competitive world will push us into a bad outcome. If the simplest policy to succeed at our task is a learned optimiser, and we try to regularise our AI to answer questions quickly, then its best strategy may be to internally searching for a policy which answers questions slowly. This makes it difficult to lean on regularisation strategies to incentivise honesty.
Play episode from 37:00
Transcript


