
Vector Podcast Economical way of serving vector search workloads with Simon Eskildsen, CEO Turbopuffer
10 snips
Sep 19, 2025 Simon Eskildsen, CEO and co-founder of Turbopuffer, has a rich background as an infrastructure engineer at Shopify. In this discussion, he reveals how Turbopuffer offers cost-effective solutions for vector search workloads, highlighting their innovative use of S3-native storage. Simon dives into the importance of latency optimization and the architectural advantages that result from their unique design. He also shares insights on the challenges of multi-tenancy, the significance of recall monitoring, and how LLM tools impact coding practices.
AI Snips
Chapters
Books
Transcript
Episode notes
Prototype Built From A Cost Problem
- Simon built the first TurboPuffer prototype alone after a Readwise recommendation project proved cloud memory costs untenable.
- Cursor later became an early customer and helped define features like mutable indexes that made the product production-ready.
New Workload Needs New Storage
- Two ingredients make a new database: a new workload and a new storage architecture that changes economics.
- Object storage plus selective hot caching can make vector workloads orders of magnitude cheaper than DRAM-only designs.
Minimize Round Trips To Storage
- Design the system to minimize round trips to object storage and saturate network/disk bandwidth per request.
- Aim for a few round trips (three to four) so cold reads stay practical and warm reads match in-memory latencies.




