
Building and Deploying Real-World RAG Applications with Ram Sriharsha - #669
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Decouple storage and compute for cost-effective workflows
Reimagining a traditional monolithic architecture to become serverless involves shifting from a search engine architecture, where data is split into shards, to decoupling storage from compute. The search engine architecture, optimized for high query volumes touching the entire corpus, becomes expensive for scenarios where queries are on demand or don't touch the entire corpus, such as web-scale queries. This decoupling of storage and compute is crucial for making workflows, like generative AI workflows, 10x more cost-effective, as observed with customers of Python who faced similar cost challenges with the search engine architecture.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.