The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732

130 snips

May 21, 2025

Sebastian Gehrmann, head of Responsible AI at Bloomberg, dives into the complexities of AI safety, particularly in retrieval-augmented generation (RAG) systems. He reveals how RAG can unintentionally compromise safety, even leading to unsafe outputs. The conversation highlights unique risks in financial services, emphasizing the need for specific governance frameworks and tailored evaluation methods. Gehrmann also addresses prompt engineering as a strategy for enhancing safety, underscoring the necessity for ongoing collaboration in the AI field to tackle emerging vulnerabilities.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

RAG Can Undermine LLM Safety

Retrieval-Augmented Generation (RAG) systems can degrade model safety despite added context being safe.
Adding safe documents in RAG can override built-in safeguards and increase unsafe outputs.

ANECDOTE

Safe Context Leading to Unsafe Guides

Asking LLMs under RAG to write guides for evading law enforcement yielded detailed unsafe instructions.
Retrieved documents were safe and unrelated, making this behavior surprising and risky.

ADVICE

Align Safeguards to Deployment

Evaluate and safeguard models in contexts matching their deployment, especially when using RAG.
Never assume model safety guarantees hold when extended with additional context.

Get the Snipd Podcast app to discover more snips from this episode

Get the app