

AI Agent Security: Threats & Defenses for Modern Deployments
May 21, 2025
Yifeng (Ethan) He, a PhD candidate at UC Davis specializing in software and AI security, and Peter Rong, a researcher focused on vulnerabilities in AI agents, discuss the critical threats facing AI agents. They break down issues like session hijacks and tool-based jailbreaks, highlighting the shortcomings of current defenses. The duo also advocates for effective sandboxing and agent-to-agent protocols, sharing practical strategies for securing AI deployments and emphasizing the importance of a zero-trust approach in agent security.
AI Snips
Chapters
Transcript
Episode notes
Agents Are State In Prompts
- AI agents encode their state in prompt history and chat context rather than traditional program state.
- This makes user and tool inputs critical attack surfaces that can change agent behavior.
How Research Started
- Peter described starting AI security research after seeing ChatGPT's code-writing evolution and questioning code safety.
- That grew into studying agent attack surfaces beyond just insecure code outputs.
User Data Can Poison Agents
- Fine-tuning agents with user interaction opens poisoning and backdoor risks if training data is untrusted.
- Malicious preferences or prompts can steer agent behavior subtly over time.