The GeekNarrator cover image

The GeekNarrator

Latest episodes

undefined
Dec 2, 2024 • 1h 8min

How would you design a database on Object Storage?

Join Kaivalya Apte and Simon Hørup Eskildsen from Turbopuffer as they talk about the complexities of building a database on top of object storage. Discover the key challenges, the nuances of various storage formats, and the critical trade-offs involved. Learn from Simon's rich experience, from his time at Shopify to creating Turbopuffer. This episode covers everything—from approaches to write-ahead logs to multi-tenancy and object storage advancements. Perfect for database enthusiasts and those keen on first-principles thinking! 00:00 Introduction 00:17 Simon's Background and Journey to TurboBuffer 02:42 Challenges in Database Scalability 04:21 Experimenting with Vector Databases 05:02 Cost Implications of Vector Databases 05:52 Architectural Considerations for Search Workloads 07:39 Building a Database on Object Storage 16:14 Designing a Simple Database on Object Storage 26:01 Handling Multiple Writers and Consistency 31:26 Trade-offs in Write Operations 32:36 Optimizing MySQL Write Performance 34:03 Batching Writes in Object Storage 35:08 Time-Based vs Size-Based Batching 36:32 Understanding Amplification in Databases 42:26 Challenges with Cold Queries 44:02 Building and Persisting B-Trees 50:53 Separating Workloads in Databases 56:07 Multi-Tenancy Challenges 01:00:39 Choosing Storage Formats 01:06:10 Key Innovations in Object Storage Databases Important links: - https://github.com/sirupsen/napkin-math (numbers) - https://turbopuffer.com/ - https://turbopuffer.com/architecture - https://sirupsen.com/napkin/problem-10-mysql-transactions-per-second - https://sirupsen.com (my blog, napkin math) - https://sirupsen.com/subscribe (napkin math newsletter) - https://github.com/rkyv/rkyv rkyv rust Become a member of The GeekNarrator to get access to member only videos, notes and monthly 1:1 with me. Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning!
undefined
Dec 2, 2024 • 1h 1min

Practical Systems Learning & Verification with Jack Vanlightly

Welcome to The GeekNarrator podcast! In this episode, host Kaivalya Apte goes deeper into the practical applications of formal methods with Jack Vanlightly, a principal technologist at Confluent. With years of experience in distributed systems, Jack discusses his journey and how formal methods have been instrumental in system design verification and bug detection. The conversation covers Jack's background, his process of using formal methods, the significance of modelling, verification, documentation, and systems learning, as well as the future evolution of tooling and its applications. Tune in to understand the intricacies of how formal methods can transform your approach to distributed systems! Chapters: 00:00 Introduction to the episode 00:37 Meet Jack VanLightly: Principal Technologist at Confluent 02:17 Jack's Journey into Distributed Systems 04:29 Discovering the Power of Formal Methods 08:11 Modeling and Simulation in Formal Methods 13:43 Verification and Safety Properties 19:02 Documentation and Communication Challenges 20:43 Formal Methods as a Systems Learning Tool 24:26 Practical Applications and Case Studies 56:38 Future of Formal Verification and Closing Thoughts Jack's Blog: https://jack-vanlightly.com/ Become a member of The GeekNarrator to get access to member only videos, notes and monthly 1:1 with me. Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning!
undefined
Nov 7, 2024 • 1h 7min

Database Internals - NileDB Postgres re-engineered for multitenant apps

Database Internals - NileDB: Postgres Re-engineered for Multitenant Apps with Gwen Shapira Join us in this episode as we dive deep into the intricacies of NileDB, a groundbreaking database re-engineered for multi-tenant applications. Our special guest, Gwen Shapira, co-founder of NileDB and a notable figure in the database field, shares her insights and technical know-how on solving the common challenges faced by multitenant SaaS applications. From the benefits of using Postgres as the underlying database to the unique tenant isolation features of NileDB, we cover it all. Don't miss out on learning about AI native capabilities, handling schema migrations, and ensuring zero downtime data migrations. Chapters: 00:00 Introduction 07:19 Challenges in Multi-Tenant Databases 11:09 Tenant Isolation and NILDB's Approach 34:16 Necessary Modifications for Tenant Data 37:47 Zero Downtime Data Migrations 44:32 Handling Schema Migrations 59:11 AI Use Cases and Vector Embedding Storage 59:51 Technical and Non-Technical Learnings from Building Nile 01:05:03 Future Plans and Upcoming Features NileDB: https://www.thenile.dev/ Blog: https://www.thenile.dev/blog Gwen's Linkedin: https://www.linkedin.com/in/gwenshapira Gwen's Twitter: https://twitter.com/gwenshap #postgres #sql #ai Become a member of The GeekNarrator to get access to member only videos, notes and monthly 1:1 with me. Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning!
undefined
Oct 19, 2024 • 57min

Building a continuous profiler with Frederic from Polar Signals

Building a Continuous Profiler with Frederic from Polar Signals | Geek Narrator Podcast In this episode we chat with Frederic from Polar Signals. We dive deep into the intricacies of building a continuous profiler, the challenges faced, and the unique solutions developed by Polar Signals. Frederic shares insights from his background in observability and discusses the innovations in FrostDB, a custom columnar database designed for high-performance query and storage of profiling data. Chapters: 00:00 Introduction 00:29 Frederic's Background 03:40 What is Continuous Profiling? 06:56 Challenges in Data Collection 18:22 Profiling Data Ingestion and Storage Architecture 27:23 Querying Data 28:52 High Cardinality Data and Cost Optimization 23:39 Tenant Isolation and Load Management 41:24 Performance Optimizations 46:02 Testing & Deterministic Simulation 50:33 Technical and Organizational Learnings 54:32 Future of Polar Signals 56:21 Conclusion You can check more about Polar Signals here: https://www.polarsignals.com/ Become a member of The GeekNarrator to get access to member only videos, notes and monthly 1:1 with me. Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #distributedsystems #systemdesign
undefined
Oct 11, 2024 • 1h 2min

Database Internals - SlateDB with Chris Riccomini

Welcome back to another episode! Today, I have a special guest, Chris Riccomini, joining me to delve into the exciting world of databases. In this episode, we focus on SlateDB, a new and innovative database that's making waves in the tech community. We'll cover a wide range of topics, including the architecture of SlateDB, its internals, design decisions, and some fascinating use cases. Chris, a seasoned software engineer with a background at LinkedIn and WePay, shares his journey and the motivations behind creating SlateDB. 🎙️ Chatpers: 00:00 Introduction to the Topic and Guest 01:58 Chris Riccomini's Background and Experience 04:19 The Genesis of SlateDB 04:54 Understanding SlateDB's Architecture 10:22 The Rise of Object Storage in Databases 13:43 Exploring SlateDB's Features and Trade-offs 32:54 Understanding Latency Trade-offs 34:12 Exploring Storage Formats and Manifest Files 37:25 Caching Strategies and Optimizations in SlateDB 50:21 Consistency Guarantees and Transactionality 52:36 Integration and Resource Management in SlateDB 56:04 Future Prospects and Use Cases for SlateDB SlateDB: https://slatedb.io/ More about Chris: https://cnr.sh/ Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #distributedsystems #systemdesign #formalmethods
undefined
Sep 22, 2024 • 1h 16min

System Design the formal way with FizzBee

In this video I talk to Jayaprabhakar Kadarkarai aka JP who is the founder of FizzBee. FizzBee is a design specification language and model checker to help developers verify their design before writing even a single line of implementation code. We have discussed where it is applicable, what are the benefits, how does it work and many other interesting challenges with examples. Chapters: 00:00 Introduction 01:13 Challenges in Designing Distributed Systems 03:13 Understanding Design Specification Languages 04:00 The Value of Structured Design Documents 09:00 When to Use Design Specification Languages 21:27 Modeling a Travel Booking System 22:51 Ensuring Atomicity in Distributed Systems 26:09 Handling Failures and Consistency 34:45 Refinement in System Design 35:38 Balancing Abstraction and Implementation 37:53 Common Pitfalls in Modeling and Implementation 40:02 Challenges in System Design and Implementation 40:12 Two-Way Feedback in System Design 41:01 Performance Considerations in Implementation 41:36 Importance of Solid Design Blueprints 41:56 Model-Based Testing and Continuous Integration 43:27 Updating Design Documentation 44:38 Simulation Testing vs. Model Checking 45:32 Design Issues and Formal Verification 49:51 Applying Formal Verification to Existing Systems 55:35 Common Design Problems and Solutions 01:07:57 Future Enhancements in Design Specification Tools 01:12:50 Getting Started with FizzBee FizzBee : https://fizzbee.io/ Get in touch with JP: https://www.linkedin.com/in/jayaprabhakar Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #distributedsystems #systemdesign #formalmethods
undefined
Aug 27, 2024 • 1h

Learnings from building Open Source Distributed Systems with Kishore Gopalakrishna

In this episode of The Geek Narrator podcast, hosted by Kaivalya Apte, we welcome a special guest, Kishore Gopalakrishna from StarTree, co-author of Apache Pinot and other notable projects. Kishore shares his extensive experience in building real-time analytics and streaming systems, including Apache Pino, Espresso, Apache Helix, and Third Eye. The episode delves into the motivations and challenges behind creating these systems, the innovations they brought to distributed systems, and the impact of community on open-source projects. Kishore also discusses the evolution of testing methodologies, cost optimizations in transactional and analytical systems, and key considerations for companies evaluating real-time analytics solutions. Don't miss this in-depth conversation packed with valuable insights for both seasoned developers and tech enthusiasts! Chapters: 00:00 Introduction 03:13 Building Distributed Systems at LinkedIn 08:57 Testing and Challenges in Distributed Systems 30:50 Advantages of Columnar Storage 33:04 The Importance of Upserts 34:24 Building a Strong Open Source Community 41:10 Challenges and Lessons in System Design 51:35 Real-Time Analytics: Do You Need It? StarTree: https://startree.ai/ Apache Pinot: https://pinot.apache.org/ If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #distributedsystems #kafka #s3 #streaming #realtimeanalytics #database #pinot #startree
undefined
Jul 19, 2024 • 1h 12min

WarpStream: A drop-in replacement for Kafka

In this episode of The GeekNarrator podcast, host Kaivalya Apte interviews Ryan and Richie, the founders of WarpStream. They discuss the architecture, benefits, and core functionalities of WarpStream, a drop-in replacement for Apache Kafka. The conversation covers their experience with Kafka, the design decisions behind WarpStream, and the operational challenges it addresses. They also delve into the seamless migration process, the scalability, and cost benefits, the integration with the Kafka ecosystem, and potential future features. This episode is a must-watch for developers and tech enthusiasts interested in modern, distributed data streaming solutions. Chapters: 00:00 Introduction 02:27 Introducing Warpstream: A Kafka Replacement 11:07 Deep Dive into Warpstream's Architecture 35:42 Exploring Kafka's Ordering Guarantees 36:52 Handling Buffering and Compaction 38:44 Efficient Data Reading and File Caching 44:06 WarpStream's Flexibility and Cost Efficiency 01:06:59 Future Features Links: WarpStream : https://www.warpstream.com/ Blog: https://www.warpstream.com/blog X: Ryan: https://x.com/ryanworl Richard Artoul: https://x.com/richardartoul Kaivalya Apte: https://x.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #distributedsystems #kafka #s3 #streaming
undefined
Jul 19, 2024 • 1h 7min

XTDB - An Immutable SQL Database

Exploring XTDB with Jeremy Taylor & Malcolm Sparks: An In-Depth Dive into Immutability and Database Internals In this episode of the Geek Narrator Podcast, host Kaivalya is joined by Jeremy Taylor and Malcolm Sparks from Juxt to explore XTDB, an immutable database designed to handle complex historical and financial data with precision. They delve into the architecture, internal mechanics, and use cases while discussing the importance of immutability. This episode covers everything you need to know about XTDB and its capabilities. Whether you're a developer interested in databases or someone curious about data management and history tracking, this discussion offers invaluable insights. Chapters: 00:00 Introduction 02:51 Challenges with General Purpose Databases 11:50 XTDB: A New Approach to Databases 31:56 Understanding Kafka and XTDB Integration 36:06 Querying and Indexing in XTDB 40:31 Temporal Data Management and Use Cases 54:52 Deployment and User Experience XTDB: https://xtdb.com/ XTDB Github: https://github.com/xtdb/xtdb Juxt: https://www.juxt.pro/ Juxt Github: https://github.com/juxt If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #sql #kafka #datastorage #immutable
undefined
Jul 19, 2024 • 1h 18min

Testing Distributed Systems the right way ft. Will Wilson

Will Wilson, Engineer and co-founder of Antithesis, dives deep into the world of deterministic simulation testing for distributed systems. He breaks down the limitations of traditional methods, showcasing how his company's approach improves software reliability. Key discussions include optimizing bug detection strategies, the significance of simulated workloads, and the challenges posed by third-party APIs. Real-world examples like chat applications illustrate how effective testing can reveal hidden issues, making this an essential listen for tech enthusiasts and developers.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode