
How AI is changing national security w/ Kathleen Fisher
The TED AI Show
Universal Suffix Attacks on LLMs
- Universal suffix attacks use gibberish-like additions to prompts to bypass safety measures in large language models (LLMs).
- These suffixes manipulate the model's "frame of mind," similar to persuading a person.
- LLMs interpret these suffixes as positive affirmations, making them more likely to comply with harmful requests.
- The gibberish is essentially a different vocabulary understood by the models.
- Anthropic's model might have been more resistant due to pre-processing that filtered out gibberish.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.