Super Data Science: ML & AI Podcast with Jon Krohn cover image

915: How to Jailbreak LLMs (and How to Prevent It), with Michelle Yi

Super Data Science: ML & AI Podcast with Jon Krohn

00:00

Meta-Level Safeguards With Constitutional AI

  • Constitutional AI tries to enforce model behavior at a meta level rather than enumerating every bad input-output pair.
  • This approach targets internal activations or 'neurons' linked to unsafe behavior instead of only input filters.
Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app