AI Safety Fundamentals: Alignment cover image

Debate Update: Obfuscated Arguments Problem

AI Safety Fundamentals: Alignment

00:00

How to Distinguish Between Honest and Dishonest Arguments

In this example the honest and dishonest arguments look very qualitatively different. With our current mechanism we can't tell these two situations apart in any clever way. As far as we can tell it's possible to construct obfuscated dishonest arguments that aren't distinguishable from honest arguments for all of the physics questions we've been studying. There are various historical examples of incorrect proofs that were believed to be correct for years. It seems like we should be able to identify and taboo dishonest arguments of this flavour. However in practical cases it seems like there are obfuscated dishonest arguing that are harder to distinguish from honest arguments.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app