The TED AI Show cover image

How AI is changing national security w/ Kathleen Fisher

The TED AI Show

INSIGHT

Universal Suffix Attacks on LLMs

  • Universal suffix attacks use gibberish-like additions to prompts to bypass safety measures in large language models (LLMs).
  • These suffixes manipulate the model's "frame of mind," similar to persuading a person.
  • LLMs interpret these suffixes as positive affirmations, making them more likely to comply with harmful requests.
  • The gibberish is essentially a different vocabulary understood by the models.
  • Anthropic's model might have been more resistant due to pre-processing that filtered out gibberish.
00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner