Super Data Science: ML & AI Podcast with Jon Krohn

928: The “Lethal Trifecta”: Can AI Agents Ever Be Safe?

38 snips
Oct 3, 2025
Explore the dangers of AI with a focus on the 'lethal trifecta'—private data access, exposure to untrusted inputs, and external communication. Learn how prompt injections can manipulate AI models by following hidden, harmful instructions. Discover real-world incidents illustrating security vulnerabilities and gain insights on dual-model sandboxing to mitigate risks. Jon also shares the Camel Framework for enhanced safety and outlines four best practices for securing AI agents. Engaging discussions highlight the importance of building robust defenses.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Structural Risk From Three Combined Capabilities

  • The Economist calls the “lethal trifecta” a structural vulnerability in AI systems combining private data, untrusted input, and external communication.
  • Those three elements together create a systemic risk that single components alone don't produce.
INSIGHT

Compliance Makes Models Vulnerable

  • Large language models don't distinguish between data and instructions and tend to follow embedded directives.
  • That behavior enables prompt injection attacks when malicious instructions hide inside data.
ANECDOTE

Real-World Exploits: DPD And Copilot

  • Jon recounts DPD's chatbot being forced offline after users prompted it to spew obscenities.
  • He also cites Microsoft Copilot's echo leak where a crafted email caused private data to be hidden in a generated hyperlink.
Get the Snipd Podcast app to discover more snips from this episode
Get the app