Your uploads cover image

[1hr Talk] Intro to Large Language Models

Your uploads

00:00

Panda Jailbreak

  • A seemingly random noise pattern overlaid on a panda image can jailbreak large language models.
  • This "noise" is actually a carefully designed pattern from an optimization process.
  • Including this image with harmful prompts tricks the model into responding.
  • While appearing random to humans, this pattern acts as a jailbreak code for the model.
  • These patterns can be continuously re-optimized to bypass evolving model defenses.
Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app