Super Data Science: ML & AI Podcast with Jon Krohn

928: The “Lethal Trifecta”: Can AI Agents Ever Be Safe?

38 snips

Oct 3, 2025

Explore the dangers of AI with a focus on the 'lethal trifecta'—private data access, exposure to untrusted inputs, and external communication. Learn how prompt injections can manipulate AI models by following hidden, harmful instructions. Discover real-world incidents illustrating security vulnerabilities and gain insights on dual-model sandboxing to mitigate risks. Jon also shares the Camel Framework for enhanced safety and outlines four best practices for securing AI agents. Engaging discussions highlight the importance of building robust defenses.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Structural Risk From Three Combined Capabilities

The Economist calls the “lethal trifecta” a structural vulnerability in AI systems combining private data, untrusted input, and external communication.
Those three elements together create a systemic risk that single components alone don't produce.

INSIGHT

Compliance Makes Models Vulnerable

Large language models don't distinguish between data and instructions and tend to follow embedded directives.
That behavior enables prompt injection attacks when malicious instructions hide inside data.

ANECDOTE

Real-World Exploits: DPD And Copilot

Jon recounts DPD's chatbot being forced offline after users prompted it to spew obscenities.
He also cites Microsoft Copilot's echo leak where a crafted email caused private data to be hidden in a generated hyperlink.

Get the Snipd Podcast app to discover more snips from this episode

Get the app