How to Train AI to Be Non-Deceptive

The problem isn't that the AI doesn't understand its own internals, it's that it doesn't care to report them. A corollary is that it might very well seem easy to make AI's non-deceptive in quotes when they're young and when all we're doing is training them to flinch away from object level thoughts of deception. This wouldn't be much evidence against the whole scheme collapsing when the AI starts getting more abstract lines of sight on the benefits of deception.

Play episode from 23:55

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app