AI Safety Fundamentals: Alignment cover image

Debate Update: Obfuscated Arguments Problem

AI Safety Fundamentals: Alignment

CHAPTER

How to Distinguish Between Honest and Dishonest Arguments

In this example the honest and dishonest arguments look very qualitatively different. With our current mechanism we can't tell these two situations apart in any clever way. As far as we can tell it's possible to construct obfuscated dishonest arguments that aren't distinguishable from honest arguments for all of the physics questions we've been studying. There are various historical examples of incorrect proofs that were believed to be correct for years. It seems like we should be able to identify and taboo dishonest arguments of this flavour. However in practical cases it seems like there are obfuscated dishonest arguing that are harder to distinguish from honest arguments.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode