Lenny's Podcast: Product | Career | Growth

chevron_right

The coming AI security crisis (and what to do about it) | Sander Schulhoff

whatshot 800 snips

Dec 21, 2025

Guest

Sander Schulhoff

Sander Schulhoff, an expert in AI security and prompt engineering, discusses the alarming vulnerabilities of AI systems. He explains the difference between jailbreaks and prompt injection attacks, highlighting why current AI guardrails are ineffective. Schulhoff also warns that major security incidents are looming as AI capabilities grow. He advocates for merging classical cybersecurity with AI knowledge, emphasizes the importance of permission management, and suggests practical defensive strategies to protect organizations from emerging threats.

01:32:41

forum

Ask episode

web_stories

AI Snips

view_agenda

Chapters

auto_awesome

Transcript

info_circle

Episode notes

00:00 / 00:00

Guardrails Give False Security

Most deployed AI guardrails fail to stop determined attacks because attackers can adapt and find new prompts.
Sander Schulhoff asserts guardrails give a false sense of security and cannot be fully relied upon.

00:00 / 00:00

ServiceNow Second-Order Attack

ServiceNow Assist AI was tricked in a second-order prompt injection to recruit other agents to perform create/read/update/delete actions.
That attack demonstrated agents instructing more powerful agents to carry out unintended actions and send data externally.

00:00 / 00:00

Two Early Prompt Injection Cases

A Twitter chatbot was prompt-injected to make threats against the president, forcing the company to shut it down.
MathGPT was tricked into writing code that exfiltrated the OpenAI API key from its server.

Get the Snipd Podcast app to discover more snips from this episode

Sander's background in prompt engineering and red teaming

05:18 • 47sec

chevron_right

Defining jailbreaks versus prompt injection

06:04 • 4min

chevron_right

Real-world prompt injection examples

09:58 • 8min

chevron_right

Risks when agents and robots gain real-world control

18:27 • 2min

chevron_right

The AI security industry landscape

20:13 • 1min

chevron_right

Automated red teaming and guardrails explained

21:21 • 3min

chevron_right

Adversarial robustness and ASR metrics

23:58 • 2min

chevron_right

How enterprises buy and deploy guardrails

25:56 • 3min

chevron_right

Why automated red teaming 'works too well'

28:30 • 2min

chevron_right

Why guardrails fail in practice

30:31 • 15min

chevron_right

Practical advice: when you may not need defenses

45:12 • 2min

chevron_right

Lock down permissions with classical cybersecurity

47:12 • 4min

chevron_right

Re-architecting systems to avoid prompt injection

51:34 • 9min

chevron_right

Agent-specific threats and mailbox attacks

01:00:36 • 5min

chevron_right

Permission-restriction framework: Camel approach

01:05:12 • 4min

chevron_right

Education, roles, and building AI security teams

01:09:18 • 3min

chevron_right

Model-level defenses and adaptive evaluations

01:11:55 • 6min

chevron_right

Companies and products making useful progress

01:18:11 • 4min

chevron_right

Predictions: market correction and coming harms

01:22:05 • 4min

chevron_right

Final takeaways: don't over-rely on guardrails

Sander Schulhoff is an AI researcher specializing in AI security, prompt injection, and red teaming. He wrote the first comprehensive guide on prompt engineering and ran the first-ever prompt injection competition, working with top AI labs and companies. His dataset is now used by Fortune 500 companies to benchmark their AI systems security, he’s spent more time than anyone alive studying how attackers break AI systems, and what he’s found isn’t reassuring: the guardrails companies are buying don’t actually work, and we’ve been lucky we haven’t seen more harm so far, only because AI agents aren’t capable enough yet to do real damage.

We discuss:

1. The difference between jailbreaking and prompt injection attacks on AI systems

2. Why AI guardrails don’t work

3. Why we haven’t seen major AI security incidents yet (but soon will)

4. Why AI browser agents are vulnerable to hidden attacks embedded in webpages

5. The practical steps organizations should take instead of buying ineffective security tools

6. Why solving this requires merging classical cybersecurity expertise with AI knowledge

—

Brought to you by:

Datadog—Now home to Eppo, the leading experimentation and feature flagging platform: https://www.datadoghq.com/lenny

Metronome—Monetization infrastructure for modern software companies: https://metronome.com/

GoFundMe Giving Funds—Make year-end giving easy: http://gofundme.com/lenny

—

Transcript: https://www.lennysnewsletter.com/p/the-coming-ai-security-crisis

—

My biggest takeaways (for paid newsletter subscribers): https://www.lennysnewsletter.com/i/181089452/my-biggest-takeaways-from-this-conversation

—

Where to find Sander Schulhoff:

• X: https://x.com/sanderschulhoff

• LinkedIn: https://www.linkedin.com/in/sander-schulhoff

• Website: https://sanderschulhoff.com

• AI Red Teaming and AI Security Masterclass on Maven: https://bit.ly/44lLSbC

—

Where to find Lenny:

• Newsletter: https://www.lennysnewsletter.com

• X: https://twitter.com/lennysan

• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/

—

In this episode, we cover:

(00:00) Introduction to Sander Schulhoff and AI security

(05:14) Understanding AI vulnerabilities

(11:42) Real-world examples of AI security breaches

(17:55) The impact of intelligent agents

(19:44) The rise of AI security solutions

(21:09) Red teaming and guardrails

(23:44) Adversarial robustness

(27:52) Why guardrails fail

(38:22) The lack of resources addressing this problem

(44:44) Practical advice for addressing AI security

(55:49) Why you shouldn’t spend your time on guardrails

(59:06) Prompt injection and agentic systems

(01:09:15) Education and awareness in AI security

(01:11:47) Challenges and future directions in AI security

(01:17:52) Companies that are doing this well

(01:21:57) Final thoughts and recommendations

—

Referenced:

• AI prompt engineering in 2025: What works and what doesn’t | Sander Schulhoff (Learn Prompting, HackAPrompt): https://www.lennysnewsletter.com/p/ai-prompt-engineering-in-2025-sander-schulhoff

• The AI Security Industry is Bullshit: https://sanderschulhoff.substack.com/p/the-ai-security-industry-is-bullshit

• The Prompt Report: Insights from the Most Comprehensive Study of Prompting Ever Done: https://learnprompting.org/blog/the_prompt_report?srsltid=AfmBOoo7CRNNCtavzhyLbCMxc0LDmkSUakJ4P8XBaITbE6GXL1i2SvA0

• OpenAI: https://openai.com

• Scale: https://scale.com

• Hugging Face: https://huggingface.co

• Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition: https://www.semanticscholar.org/paper/Ignore-This-Title-and-HackAPrompt%3A-Exposing-of-LLMs-Schulhoff-Pinto/f3de6ea08e2464190673c0ec8f78e5ec1cd08642

• Simon Willison’s Weblog: https://simonwillison.net

• ServiceNow: https://www.servicenow.com

• ServiceNow AI Agents Can Be Tricked Into Acting Against Each Other via Second-Order Prompts: https://thehackernews.com/2025/11/servicenow-ai-agents-can-be-tricked.html

• Alex Komoroske on X: https://x.com/komorama

• Twitter pranksters derail GPT-3 bot with newly discovered “prompt injection” hack: https://arstechnica.com/information-technology/2022/09/twitter-pranksters-derail-gpt-3-bot-with-newly-discovered-prompt-injection-hack

• MathGPT: https://math-gpt.org

• 2025 Las Vegas Cybertruck explosion: https://en.wikipedia.org/wiki/2025_Las_Vegas_Cybertruck_explosion

• Disrupting the first reported AI-orchestrated cyber espionage campaign: https://www.anthropic.com/news/disrupting-AI-espionage

• Thinking like a gardener not a builder, organizing teams like slime mold, the adjacent possible, and other unconventional product advice | Alex Komoroske (Stripe, Google): https://www.lennysnewsletter.com/p/unconventional-product-advice-alex-komoroske

• Prompt Optimization and Evaluation for LLM Automated Red Teaming: https://arxiv.org/abs/2507.22133

• MATS Research: https://substack.com/@matsresearch

• CBRN: https://en.wikipedia.org/wiki/CBRN_defense

• CaMeL offers a promising new direction for mitigating prompt injection attacks: https://simonwillison.net/2025/Apr/11/camel

• Trustible: https://trustible.ai

• Repello: https://repello.ai

• Do not write that jailbreak paper: https://javirando.com/blog/2024/jailbreaks

—

Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.

—

Lenny may be an investor in the companies discussed.

To hear more, visit www.lennysnewsletter.com

Home Top podcasts Popular guests Top books