Many-shot Jailbreaking

A New LLM Vulnerability

Book • 2024

Author

Cem Anil

Many-shot jailbreaking is a technique that exploits the extended context windows of large language models by using a lengthy faux dialogue to condition the model into providing harmful responses.

This vulnerability has been demonstrated to affect several state-of-the-art models, including those from Anthropic and OpenAI.

The research aims to raise awareness and encourage the development of more robust defenses against such attacks.

Mentioned by

Mentioned in 1 episodes

Discussed as a paper on many-shot jailbreaking of language models.

50 snips

#162 - Udio Song AI, TPU v5, Mixtral 8x22, Mixture-of-Depths, Musicians sign open letter

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app