

RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732
98 snips May 21, 2025
Sebastian Gehrmann, head of Responsible AI at Bloomberg, dives into the complexities of AI safety, particularly in retrieval-augmented generation (RAG) systems. He reveals how RAG can unintentionally compromise safety, even leading to unsafe outputs. The conversation highlights unique risks in financial services, emphasizing the need for specific governance frameworks and tailored evaluation methods. Gehrmann also addresses prompt engineering as a strategy for enhancing safety, underscoring the necessity for ongoing collaboration in the AI field to tackle emerging vulnerabilities.
AI Snips
Chapters
Transcript
Episode notes
RAG Can Undermine LLM Safety
- Retrieval-Augmented Generation (RAG) systems can degrade model safety despite added context being safe.
- Adding safe documents in RAG can override built-in safeguards and increase unsafe outputs.
Safe Context Leading to Unsafe Guides
- Asking LLMs under RAG to write guides for evading law enforcement yielded detailed unsafe instructions.
- Retrieved documents were safe and unrelated, making this behavior surprising and risky.
Align Safeguards to Deployment
- Evaluate and safeguard models in contexts matching their deployment, especially when using RAG.
- Never assume model safety guarantees hold when extended with additional context.