LessWrong (Curated & Popular) cover image

"Where I agree and disagree with Eliezer" by Paul Christiano

LessWrong (Curated & Popular)

00:00

Elieza's Arguments on Honesty

Elieza raises many good considerations backed by pretty clear arguments, but makes confident assertions that are much stronger than anything suggested by actual argument. It's very hard to even set up games for which no strategy can outperform honesty. And if we don't build an AI that understands our preferences about this kind of subtle bad behaviour, then a competitive world will push us into a bad outcome. If the simplest policy to succeed at our task is a learned optimiser, and we try to regularise our AI to answer questions quickly, then its best strategy may be to internally searching for a policy which answers questions slowly. This makes it difficult to lean on regularisation strategies to incentivise honesty.

Play episode from 37:00
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app