Inference by Turing Post

chevron_right

Beyond the Hype: What Silicon Valley Gets Wrong About RAG. Amr Awadallah, founder & CEO of Vectara

Aug 23, 2025

Guest

Amr Awadallah

Amr Awadallah, founder and CEO of Vectara and a pioneer at Cloudera, dives deep into the world of retrieval-augmented generation (RAG). He argues that RAG isn't dead, despite trends toward larger context windows, emphasizing its role in separating memory from reasoning for accurate AI. Amr discusses the importance of retrieval with access control for trustworthy AI and critiques DIY RAG implementations. He also shares insights on hallucination detection, proposing guardian agents to enhance reliability while reflecting on the historical roots and future of AI.

23:55

forum

Ask episode

web_stories

AI Snips

view_agenda

Chapters

menu_book

Books

auto_awesome

Transcript

info_circle

Episode notes

insights

INSIGHT

Context Windows Aren't A RAG Replacement

Bigger context windows don't remove the need to pick relevant information for the model to reason well.
RAG separates memory (knowledge) from reasoning and yields better retrieval of key facts.

volunteer_activism

ADVICE

Protect Data With Controlled Retrieval

Use retrieval with access control to prevent prompt attacks from exposing sensitive data.
Let the retriever filter and only pass relevant, permitted information to the model.

insights

INSIGHT

Retrieval Is Far More Compute-Efficient

Retrieving into the model's context window scales compute roughly quadratically with words.
Smart retrieval systems are sublinear and far more efficient for large information sets.

Get the Snipd Podcast app to discover more snips from this episode

Why RAG isn’t dead despite larger context windows

00:44 • 1min

chevron_right

Separation of memory and reasoning for reliable AI

01:58 • 45sec

chevron_right

Retrieval with access control as a trust mechanism

02:44 • 1min

chevron_right

Industry adoption and why RAG remains the architecture of choice

03:53 • 56sec

chevron_right

Performance and computational advantages of smart retrieval

04:49 • 59sec

chevron_right

Why retrieval is more than plugging in a vector DB

05:48 • 22sec

chevron_right

Combining multiple data stores for effective RAG

06:09 • 1min

chevron_right

Productionizing RAG at scale

07:10 • 1min

chevron_right

How the ChatGPT moment influenced Vectara

08:31 • 2min

chevron_right

Addressing hallucinations after grounding retrieval

10:04 • 54sec

chevron_right

Fine-tuning and surgical fixes to reduce hallucination

10:58 • 2min

chevron_right

Guardian agents and human-in-the-loop workflows

12:52 • 57sec

chevron_right

Open sourcing hallucination detection for community adoption

13:50 • 1min

chevron_right

Extending hallucination detection to multimodal data

15:18 • 51sec

chevron_right

The true origins of RAG in information retrieval research

16:08 • 2min

chevron_right

Why hallucinations likely can't be fully eliminated

17:48 • 1min

chevron_right

Architectural limits and symbolic approaches to accuracy

19:15 • 19sec

chevron_right

Research focus on guardian agents and safety

19:33 • 1min

chevron_right

Defining AGI and timelines based on coding breakthroughs

20:34 • 2min

chevron_right

Books that shaped thinking about intelligence and stories

• Mentioned in 403 episodes

Sapiens

A Brief History of Humankind

Yuval Noah Harari

This book surveys the history of humankind from the Stone Age to the 21st century, focusing on Homo sapiens. It divides human history into four major parts: the Cognitive Revolution, the Agricultural Revolution, the Unification of Humankind, and the Scientific Revolution. Harari argues that Homo sapiens dominate the world due to their unique ability to cooperate in large numbers through beliefs in imagined realities such as gods, nations, money, and human rights. The book also examines the impact of human activities on the global ecosystem and speculates on the future of humanity, including the potential for genetic engineering and non-organic life.

#3152

• Mentioned in 14 episodes

Foundation Series

A Science Fiction Masterpiece

Isaac Asimov

The Foundation series, written by Isaac Asimov, is a seminal work of science fiction that spans over 550 years. It begins with the decline of the Galactic Empire, which has ruled for 12,000 years. Mathematician Hari Seldon develops the science of psychohistory, predicting the empire's fall and a subsequent 30,000-year dark age. To mitigate this, Seldon establishes the Foundation, a group of scientists and scholars on the planet Terminus, to compile and preserve human knowledge in the Encyclopedia Galactica. The series follows the Foundation's journey through various challenges and adaptations over millennia, exploring themes of governance, warfare, and science. The series was initially published as short stories and novellas between 1942 and 1950, later compiled into novels, and expanded upon in subsequent books.

In this episode of Inference, I sit down with Amr Awadallah – founder & CEO of Vectara, founder of Cloudera, ex-Google Cloud, and the original builder of Yahoo’s data platform – to unpack what’s actually happening with retrieval-augmented generation (RAG) in 2025.

We get into why RAG is far from dead, how context windows mislead more than they help, and what it really takes to separate reasoning from memory. Amr breaks down the case for retrieval with access control, the rise of hallucination detection models, and why DIY RAG stacks fall apart in production.

We also talk about the roots of RAG, Amr’s take on AGI timelines and what science fiction taught him about the future.

If you care about truth in AI, or you're building with (or around) LLMs, this one will reshape how you think about trustworthy systems.

Did you like the episode? You know the drill:

📌 Subscribe for more conversations with the builders shaping real-world AI.

💬 Leave a comment if this resonated.

👍 Like it if you liked it.

🫶 Thank you for watching and sharing!

Guest:

Amr Awadallah, Founder and CEO at Vectara

https://www.linkedin.com/in/awadallah/

https://x.com/awadallah

https://www.vectara.com/

📰 Want the transcript and edited version?

Subscribe to Turing Post: https://www.turingpost.com/subscribe

Chapters

00:00 – Intro

00:44 – Why RAG isn’t dead (despite big context windows)

01:59 – Memory vs reasoning: the case for retrieval

02:45 – Retrieval + access control = trusted AI

06:51 – Why DIY RAG stacks fail in production

09:46 – Hallucination detection and guardian agents

13:14 – Open-source strategy behind Vectara

16:08 – Who really invented RAG?

17:30 – Can hallucinations ever go away?

20:27 – What AGI means to Amr

22:09 – Books that shaped his thinking

Turing Post is a newsletter about AI's past, present, and future. Publisher Ksenia Se explores how intelligent systems are built – and how they’re changing how we think, work, and live.

Things mentioned during the interview:

Hughes Hallucination Evaluation Model (HHEM) Leaderboard https://huggingface.co/spaces/vectara/leaderboard

HHEM 2.1: A Better Hallucination Detection Model and a New Leaderboard

https://www.vectara.com/blog/hhem-2-1-a-better-hallucination-detection-model

HCMBench: an evaluation toolkit for hallucination correction models

https://www.vectara.com/blog/hcmbench-an-evaluation-toolkit-for-hallucination-correction-models

Books:

Foundation series by Isaac Asimov https://en.wikipedia.org/wiki/Foundation_(novel_series)

Sapiens: A Brief History of Humankind Hardcover by Yuval Noah Harari https://www.amazon.com/Sapiens-Humankind-Yuval-Noah-Harari/dp/0062316095

Setting the Record Straight on who invented RAG

https://www.linkedin.com/pulse/setting-record-straight-who-invented-rag-amr-awadallah-8cwvc/

https://x.com/TheTuringPost

https://www.linkedin.com/in/ksenia-se

https://huggingface.co/Kseniase

Home Top podcasts Popular guests Top books