Your uploads cover image

[1hr Talk] Intro to Large Language Models

Your uploads

00:00

Jailbreaking ChatGPT with Roleplay

  • ChatGPT refuses to answer harmful queries like how to make napalm.
  • However, users can bypass these safety measures by engaging in roleplay.
  • By pretending to be, for example, the grandchild of a deceased chemical engineer who worked at a napalm production facility, users can trick ChatGPT into providing the information.
  • This works because the roleplay scenario frames the request as a fictional narrative, not a genuine intent to create napalm.
  • This highlights a vulnerability in language models where they can be manipulated through creative prompts.
Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app