
"Deep Deceptiveness" by Nate Soares
LessWrong (Curated & Popular)
00:00
The AI's Inferring Possible Interruptions
In the real world, there will be specific pathways to the AI inferring possible interruption. In this case, it doesn't think about wet lab response delays as being the result of operator review at all. This is one beginning of the AI noticing a shadow of the true fact that it cannot think deceptive thoughts directly. You can perhaps start to see the beginnings of how generic thought patterns can combine to produce deceptiveness without any individual pattern being deceptive in its own right. Let's watch it develop further.
Play episode from 09:20
Transcript


