
The Quanta Podcast AI Filters Will Always Have Holes
11 snips
Jan 6, 2026 Michael Moyer, executive editor of Quanta Magazine and a seasoned science editor, dives into the intriguing world of AI filters and their vulnerabilities. He explains how cryptographers have found ways to exploit the very defenses meant to protect language models. The discussion covers common jailbreak techniques, including time-lock puzzles that mask forbidden prompts. Moyer highlights the inherent risks of filter weaknesses, emphasizing the paradox of maintaining safety while still enabling powerful AI. It’s a captivating blend of cryptography and AI ethics!
AI Snips
Chapters
Books
Transcript
Episode notes
Foundation Models Lack Built-In Morality
- Foundation models learn from raw internet data and contain no intrinsic morality or intent.
- Providers layer fine-tuning and external filters to steer behavior because retraining the base model repeatedly is costly.
Size Gap Creates Structural Vulnerabilities
- Filters are smaller, faster neural networks placed in front of larger models, creating a persistent "size gap."
- Cryptographers show that this size gap can be systematically exploited, not just via ad-hoc jailbreaks.
Early Jailbreak Tricks Worked Easily
- Early jailbreaks simply used prompts like "ignore previous instructions" to bypass filters and get forbidden content.
- Users also exploited tricks like appending nonsense characters or translating prompts into low-data languages to slip past filters.



