
Coercing LLMs to Do and Reveal (Almost) Anything with Jonas Geiping - #678
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
The Power of Invisible Character Attacks in Persuading Models
Layering invisible character attacks into texts can make them appear innocuous while influencing the output of models, leading to a potential misuse of trust in the model's decisions. This persuasive technique can mislead individuals into believing that the model's suggestions are accurate, highlighting the vulnerability of models to subtle manipulations.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.