The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732

98 snips
May 21, 2025
Sebastian Gehrmann, head of Responsible AI at Bloomberg, dives into the complexities of AI safety, particularly in retrieval-augmented generation (RAG) systems. He reveals how RAG can unintentionally compromise safety, even leading to unsafe outputs. The conversation highlights unique risks in financial services, emphasizing the need for specific governance frameworks and tailored evaluation methods. Gehrmann also addresses prompt engineering as a strategy for enhancing safety, underscoring the necessity for ongoing collaboration in the AI field to tackle emerging vulnerabilities.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

RAG Can Undermine LLM Safety

  • Retrieval-Augmented Generation (RAG) systems can degrade model safety despite added context being safe.
  • Adding safe documents in RAG can override built-in safeguards and increase unsafe outputs.
ANECDOTE

Safe Context Leading to Unsafe Guides

  • Asking LLMs under RAG to write guides for evading law enforcement yielded detailed unsafe instructions.
  • Retrieved documents were safe and unrelated, making this behavior surprising and risky.
ADVICE

Align Safeguards to Deployment

  • Evaluate and safeguard models in contexts matching their deployment, especially when using RAG.
  • Never assume model safety guarantees hold when extended with additional context.
Get the Snipd Podcast app to discover more snips from this episode
Get the app