AI Safety Fundamentals: Alignment cover image

Debate Update: Obfuscated Arguments Problem

AI Safety Fundamentals: Alignment

00:00

The Obfuscated Argument Problem

The obfuscated argument problem suggests that we may not be able to rely on debaters to find flaws in large arguments. We can't see a way to distinguish a certain class of dishonest arguments from honest arguments. This is probably better investigated through ML experiments or theoretical research than through human experiments. To supervise ML systems that make such decisions we need to be able to trust the representations or heuristics that our models learn from the training data.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app