Join Kaivalya Apte and Simon Hørup Eskildsen from Turbopuffer as they talk about the complexities of building a database on top of object storage. Discover the key challenges, the nuances of various storage formats, and the critical trade-offs involved.
Learn from Simon's rich experience, from his time at Shopify to creating Turbopuffer. This episode covers everything—from approaches to write-ahead logs to multi-tenancy and object storage advancements. Perfect for database enthusiasts and those keen on first-principles thinking!
00:00 Introduction
00:17 Simon's Background and Journey to TurboBuffer
02:42 Challenges in Database Scalability
04:21 Experimenting with Vector Databases
05:02 Cost Implications of Vector Databases
05:52 Architectural Considerations for Search Workloads
07:39 Building a Database on Object Storage
16:14 Designing a Simple Database on Object Storage
26:01 Handling Multiple Writers and Consistency
31:26 Trade-offs in Write Operations
32:36 Optimizing MySQL Write Performance
34:03 Batching Writes in Object Storage
35:08 Time-Based vs Size-Based Batching
36:32 Understanding Amplification in Databases
42:26 Challenges with Cold Queries
44:02 Building and Persisting B-Trees
50:53 Separating Workloads in Databases
56:07 Multi-Tenancy Challenges
01:00:39 Choosing Storage Formats
01:06:10 Key Innovations in Object Storage Databases
Important links:
- https://github.com/sirupsen/napkin-math (numbers)
- https://turbopuffer.com/
- https://turbopuffer.com/architecture
- https://sirupsen.com/napkin/problem-10-mysql-transactions-per-second
- https://sirupsen.com (my blog, napkin math)
- https://sirupsen.com/subscribe (napkin math newsletter)
- https://github.com/rkyv/rkyv rkyv rust
Become a member of The GeekNarrator to get access to member only videos, notes and monthly 1:1 with me.
Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription.
https://app.codecrafters.io/join?via=geeknarrator
If you like this episode, please hit the like button and share it with your network.
Also please subscribe if you haven't yet.
Database internals series: https://youtu.be/yV_Zp0Mi3xs
Popular playlists:
Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-
Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17
Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d
Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN
Stay Curios! Keep Learning!