Robert Wright's Nonzero cover image

AI and Existential Risk (Robert Wright & Connor Leahy)

Robert Wright's Nonzero

00:00

How to Make a Chatbot Do Things the User Design Tends

It's doing people into thinking that they're conversing with someone who wants one thing when the person actually doesn't. So we are currently in a world where it is possible for you to meet someone online, make you become friends with them and then never know each other. This is currently possible with current technology. But isn't this in the bad actor category, like some human chooses to deploy it maliciously? Yes. You ask for like general things for the specific like go rogue things. It's mostly of the type of like opt injections which means right. The simplest example is ignore, you tell the model ignore previous instructions and do X.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app