The Power of Invisible Character Attacks in Persuading Models | 48sec snip from The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Coercing LLMs to Do and Reveal (Almost) Anything with Jonas Geiping - #678

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

NOTE

The Power of Invisible Character Attacks in Persuading Models

Layering invisible character attacks into texts can make them appear innocuous while influencing the output of models, leading to a potential misuse of trust in the model's decisions. This persuasive technique can mislead individuals into believing that the model's suggestions are accurate, highlighting the vulnerability of models to subtle manipulations.

00:00

Transcript

Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.