#033 RAG's Biggest Problems & How to Fix It (ft. Synthetic Data)

14 snips

Nov 28, 2024

Saahil Ognawala, Head of Product at Jina AI and expert in RAG systems, dives deep into the complexities of retrieval augmented generation. He reveals why RAG systems often falter in production and how strategic testing and synthetic data can enhance performance. The conversation covers the vital role of user intent, evaluation metrics, and the balancing act between real and synthetic data. Saahil also emphasizes the importance of continuous user feedback and the need for robust evaluation frameworks to fine-tune AI models effectively.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

RAG: Not a Magic Bullet

RAG systems aren't production-ready out-of-the-box solutions.
They require systems, metrics, and processes for optimization.

ADVICE

Define Bad Results

Define bad results explicitly in your retrieval benchmarks, including hallucinations.
This helps identify seemingly correct but factually wrong answers.

ANECDOTE

Insurance Chatbot Example

An insurance chatbot example highlights how LLMs can generate plausible yet incorrect answers.
Evaluation benchmarks must address this by identifying such cases.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

RAG isn't a magic fix for search problems. While it works well at first, most teams find it's not good enough for production out of the box. The key is to make it better step by step, using good testing and smart data creation.

Today, we are talking to Saahil Ognawala from Jina AI to start to understand RAG.

To build a good RAG system, you need three things: ways to test it, methods to create training data, and plans to make it better over time. Testing starts with a set of example searches that users might make. These should include common searches that happen often, medium-rare searches, and rare searches that only happen now and then. This mix helps you measure if changes make your system better or worse.

Creating synthetic data helps make the system stronger, especially in spotting wrong answers that look right. Think of someone searching for a "gluten-free chocolate cake." A "sugar-free chocolate cake" might look like a good answer because it shares many words, but it's wrong.

These tricky examples help the system learn the difference between similar but different things.

When creating synthetic data, you need rules. The best way is to show the AI a few real examples and give it a list of topics to work with. Most teams find that using half real data and half synthetic data works best. This gives you enough variety while keeping things real.

Getting user feedback is hard with RAG. In normal search, you can see if users click on results. But with RAG, the system creates an answer from many pieces. A good answer might come from both good and bad pieces, making it hard to know which parts helped. This means you need smart ways to track which pieces of information actually helped make good answers.

One key rule: don't make things harder than they need to be. If simple keyword search (called BM25) works well enough, adding fancy AI search might not be worth the extra work.

Success with RAG comes from good testing, careful data creation, and steady improvements based on real use. It's not about using the newest AI models. It's about building good systems and processes that work reliably.

"It isn’t a magic wand you can place on your catalog and expect results you didn’t get before."

“Most of our users are enterprise users who have seen the most success in their RAG systems are the ones that very early implemented a continuous feedback mechanism.“

“If you can't tell in real time usage whether an answer is a bad answer or a right answer because the LLM just makes it look like the right answer then you only have your retrieval dataset to blame”

Saahil Ognawala:

Nicolay Gerold:

00:00 Introduction to Retrieval Augmented Generation (RAG) 00:29 Interview with Saahil Ognawala 00:52 Synthetic Data in Language Generation 01:14 Understanding the E5 Mistral Instructor Embeddings Paper 03:15 Challenges and Evolution in Synthetic Data 05:03 User Intent and Retrieval Systems 11:26 Evaluating RAG Systems 14:46 Setting Up Evaluation Frameworks 20:37 Fine-Tuning and Embedding Models 22:25 Negative and Positive Examples in Retrieval 26:10 Synthetic Data for Hard Negatives 29:20 Case Study: Marine Biology Project 29:54 Addressing Errors in Marine Biology Queries 31:28 Ensuring Query Relevance with Human Intervention 31:47 Few Shot Prompting vs Zero Shot Prompting 35:09 Balancing Synthetic and Real World Data 37:17 Improving RAG Systems with User Feedback 39:15 Future Directions for Jina and Synthetic Data 40:44 Building and Evaluating Embedding Models 41:24 Getting Started with Jina and Open Source Tools 51:25 The Importance of Hard Negatives in Embedding Models