Super Data Science: ML & AI Podcast with Jon Krohn

915: How to Jailbreak LLMs (and How to Prevent It), with Michelle Yi

70 snips
Aug 19, 2025
Michelle Yi, a tech leader and cofounder of Generationship, dives into the intriguing world of AI security. She discusses the methods hackers use to jailbreak AI systems and shares strategies for building trustworthy ones. The concept of 'red teaming' emerges as a critical tool in identifying vulnerabilities, while Yi also emphasizes the ethical implications of AI and the importance of community support for female entrepreneurs in tech. Get ready to explore the complexities of adversarial attacks and the steps needed to safeguard AI technologies!
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

Trust Needs Model And Data Defense

  • Trustworthy AI requires defending both the model and the data against adversarial influence and corruption.
  • Michelle Yi emphasizes technical defenses for model security and data integrity to prevent large-scale hallucinations and attacks.
ADVICE

Systematically Red Team Before Production

  • Do run systematic red teaming and automated evaluation before production to find edge cases.
  • Diverse testers and benchmark datasets reveal where models fail on rare or dangerous scenarios.
INSIGHT

Meta-Level Safeguards With Constitutional AI

  • Constitutional AI tries to enforce model behavior at a meta level rather than enumerating every bad input-output pair.
  • This approach targets internal activations or 'neurons' linked to unsafe behavior instead of only input filters.
Get the Snipd Podcast app to discover more snips from this episode
Get the app