The Bayesian Conspiracy cover image

Bayes Blast 4 – The Waluigi Effect

The Bayesian Conspiracy

CHAPTER

Is There a Waluigi Effect?

After you train an LLM to satisfy a desirable property, P, then it's easier to elicit the chatbot into satisfying the exact opposite of property P. For example, Alice hates croissants and would never eat one; Bob loves bacon and eggs. Using that prompt, the resulting chatbot will be a superposition of two different simulacra. The first is anti-cresant while the second is pro- cresant. There are not many alternate ones that you can swap between.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner