LessWrong (Curated & Popular) cover image

"Deep Deceptiveness" by Nate Soares

LessWrong (Curated & Popular)

00:00

The AI's Inferring Possible Interruptions

In the real world, there will be specific pathways to the AI inferring possible interruption. In this case, it doesn't think about wet lab response delays as being the result of operator review at all. This is one beginning of the AI noticing a shadow of the true fact that it cannot think deceptive thoughts directly. You can perhaps start to see the beginnings of how generic thought patterns can combine to produce deceptiveness without any individual pattern being deceptive in its own right. Let's watch it develop further.

Play episode from 09:20
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app