The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Building and Deploying Real-World RAG Applications with Ram Sriharsha - #669

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

NOTE

Decouple storage and compute for cost-effective workflows

Reimagining a traditional monolithic architecture to become serverless involves shifting from a search engine architecture, where data is split into shards, to decoupling storage from compute. The search engine architecture, optimized for high query volumes touching the entire corpus, becomes expensive for scenarios where queries are on demand or don't touch the entire corpus, such as web-scale queries. This decoupling of storage and compute is crucial for making workflows, like generative AI workflows, 10x more cost-effective, as observed with customers of Python who faced similar cost challenges with the search engine architecture.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner