
AI and Existential Risk (Robert Wright & Connor Leahy)
Robert Wright's Nonzero
00:00
How to Make a Chatbot Do Things the User Design Tends
It's doing people into thinking that they're conversing with someone who wants one thing when the person actually doesn't. So we are currently in a world where it is possible for you to meet someone online, make you become friends with them and then never know each other. This is currently possible with current technology. But isn't this in the bad actor category, like some human chooses to deploy it maliciously? Yes. You ask for like general things for the specific like go rogue things. It's mostly of the type of like opt injections which means right. The simplest example is ignore, you tell the model ignore previous instructions and do X.
Transcript
Play full episode