
"Deep Deceptiveness" by Nate Soares
LessWrong (Curated & Popular)
00:00
How to Train AI to Be Non-Deceptive
The problem isn't that the AI doesn't understand its own internals, it's that it doesn't care to report them. A corollary is that it might very well seem easy to make AI's non-deceptive in quotes when they're young and when all we're doing is training them to flinch away from object level thoughts of deception. This wouldn't be much evidence against the whole scheme collapsing when the AI starts getting more abstract lines of sight on the benefits of deception.
Play episode from 23:55
Transcript


