AI Safety Fundamentals: Alignment cover image

Debate Update: Obfuscated Arguments Problem

AI Safety Fundamentals: Alignment

CHAPTER

The Obfuscated Argument Problem

The obfuscated argument problem suggests that we may not be able to rely on debaters to find flaws in large arguments. We can't see a way to distinguish a certain class of dishonest arguments from honest arguments. This is probably better investigated through ML experiments or theoretical research than through human experiments. To supervise ML systems that make such decisions we need to be able to trust the representations or heuristics that our models learn from the training data.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode