
"Deep Deceptiveness" by Nate Soares
LessWrong (Curated & Popular)
00:00
The AI's Anthropomorphized Threads of Deliberation
A fledgling AI is being deployed towards building something like a bacterium, but with a diamondoid shell. The original hope was to have the AI assist human engineers, but first versions that were able to do the hardest parts of engineering work at all were able to go much farther on their own. We'll suppose the AI has already been gradient descent trained against deceptive outputs and has internally ended up with internal mechanisms that detect and shut down the precursors of deceptive thinking. As the instructions near completion, the thread forks off a sub-thread for figuring out what might go wrong. This sub-thread deploys a new cognitive strategy that has been sharply rewarded twice in recent history. Simplify and
Play episode from 04:22
Transcript


