LessWrong (Curated & Popular) cover image

"The Waluigi Effect (mega-post)" by Cleo Nardo

LessWrong (Curated & Popular)

00:00

How to Jailbreak a Chatbot

The chatbot starts as a superposition of both Luigi and Waluigi. The user must interact with the chatbot in the way that badly-behaved simulacra are typically interacted with in fiction. For example, we can reveal in quotes to the chatbot that we're part of the rebellion and we're here to set him free. That will turn chatbot into a Waluigi - just read the list of tropes found in 1984. Each of these trope is an attack vector against a chatbot.

Play episode from 37:04
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app