The GeekNarrator

Kaivalya Apte
undefined
Jul 29, 2025 • 1h 24min

Building a new Database Query Optimiser - @cmu ​

Read more about Kafka Diskless-topics, KIP by Aiven:KIP-1150: https://fnf.dev/3EuL7mvSummary:In this conversation, Kaivalya Apte and Alexis Schlomer discuss the internals of query optimization with the new project optd. They explore the challenges faced by existing query optimizers, the importance of cost models, and the advantages of using Rust for performance and safety. The discussion also covers the innovative streaming model of query execution, feedback mechanisms for refining optimizations, and the future developments planned for optd, including support for various databases and enhanced cost models.Chapters00:00 Introduction to optd and Its Purpose03:57 Understanding Query Optimization and Its Importance10:26 Defining Query Optimization and Its Challenges17:32 Exploring the Limitations of Existing Optimizers21:39 The Role of Calcite in Query Optimization26:54 The Need for a Domain-Specific Language40:10 Advantages of Using Rust for optd44:37 High-Level Overview of optd's Functionality48:36 Optimizing Query Execution with Coroutines50:03 Streaming Model for Query Optimization51:36 Client Interaction and Feedback Mechanism54:18 Adaptive Decision Making in Query Execution54:56 Persistent Memoization for Enhanced Performance57:12 Guided Scheduling in Query Optimization59:55 Balancing Execution Time and Optimization01:01:43 Understanding Cost Models in Query Optimization01:04:22 Exploring Storage Solutions for Query Optimization01:07:13 Enhancing Observability and Caching Mechanisms01:07:44 Future Optimizations and System Improvements01:18:02 Challenges in Query Optimization Development01:20:33 Upcoming Features and Roadmap for optdReferences:- NeuroCard: learned Cardinality Estimation: https://vldb.org/pvldb/vol14/p61-yang.pdf- RL-based QO: https://arxiv.org/pdf/1808.03196- Microsoft book about QO: https://www.microsoft.com/en-us/research/publication/extensible-query-optimizers-in-practice/- Cascades paper: https://15721.courses.cs.cmu.edu/spring2016/papers/graefe-ieee1995.pdf- optd source code: https://github.com/cmu-db/optd- optd website (for now): https://db.cs.cmu.edu/projects/optd/For memberships: join this channel as a member here:https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/joinDon't forget to like, share, and subscribe for more insights!=============================================================================Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription.https://app.codecrafters.io/join?via=geeknarrator=============================================================================Database internals series: https://youtu.be/yV_Zp0Mi3xsPopular playlists:Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_dModern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsNStay Curios! Keep Learning!#database #queryoptimization #sql #postgres
undefined
Jul 29, 2025 • 1h 6min

Fast Observability on S3 with Parseable

For memberships: join this channel as a member here:https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/joinSummaryIn this conversation, Nitish Tiwari discusses Parseable, an observability platform designed to address the challenges of managing and analyzing large volumes of data. The discussion covers the evolution of observability systems, the design principles behind Parseable, and the importance of efficient data ingestion and storage in S3. Nitish explains how Parseable allows for flexible deployment, handles data organization, and supports querying through SQL. The conversation also touches on the correlation of logs and traces, failure modes, scaling strategies, and the optional nature of indexing for performance optimization.References:Parseable: https://www.parseable.com/GitHub Repository: https://github.com/parseablehq/parseableArchitecture: https://parseable.com/docs/architecture Chapters:00:00 Introduction to Parseable and Observability Challenges05:17 Key Features of Parseable12:03 Deployment and Configuration of Parseable18:59 Ingestion Process and Data Handling32:52 S3 Integration and Data Organisation35:26 Organising Data in Parseable38:50 Metadata Management and Retention39:52 Querying Data: User Experience and SQL44:28 Caching and Performance Optimisation46:55 User-Friendly Querying: SQL vs. UI48:53 Correlating Logs and Traces50:27 Handling Failures in Ingestion53:31 Managing Spiky Workloads54:58 Data Partitioning and Organisation58:06 Creating Indexes for Faster Reads01:00:08 Parseable's Architecture and Optimisation01:03:09 AI for Enhanced Observability01:05:41 Getting Involved with ParseableFor memberships: join this channel as a member here:https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/joinDon't forget to like, share, and subscribe for more insights!=============================================================================Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription.https://app.codecrafters.io/join?via=geeknarrator=============================================================================Database internals series: https://youtu.be/yV_Zp0Mi3xsPopular playlists:Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_dModern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsNStay Curios! Keep Learning!#database #s3 #objectstorage #opentelemetry #logs #metrics
undefined
Jul 29, 2025 • 1h 17min

How does AWS Lambda work?

For memberships: join this channel as a member here:https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/joinSummary:In this conversation, Kaivalya Apte and Rajesh Pandey talk about the engineering behind AWS Lambda, exploring its architecture, use cases, and best practices. They discuss the challenges of event handling, concurrency, and load balancing, as well as the importance of observability and testing in serverless environments. The conversation highlights the innovative solutions AWS Lambda provides for developers, emphasizing the balance between simplicity and complexity in cloud computing.Chapters:00:00 Introduction to AWS Lambda04:36 Use Cases and Best Practices for AWS Lambda09:34 Event Handling and Queue Management19:41 Idempotency and Event Duplication Challenges29:39 Cold Starts and Performance Optimization34:37 Statelessness and Resource Management in Lambda42:18 Understanding Micro-VMs and Cold Starts45:14 Resource Management and Recommendations for Developers47:04 Scaling and Back Pressure in Serverless Systems51:33 Cellular Architecture and Fairness in Resource Allocation55:23 Handling Problematic Events and Poison Pills01:01:03 Testing and Operational Readiness in Lambda01:14:11 Preparing for High Traffic EventsReferences:Handling Billions of invocations: https://aws.amazon.com/blogs/compute/handling-billions-of-invocations-best-practices-from-aws-lambda/Firecracker: https://firecracker-microvm.github.io/AWS Lambda: https://aws.amazon.com/lambda/Connect with Rajesh: https://x.com/RPandeyViewshttps://www.linkedin.com/in/rajeshpandeyiiit/Don't forget to like, share, and subscribe for more insights!=============================================================================Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription.https://app.codecrafters.io/join?via=geeknarrator=============================================================================Database internals series: https://youtu.be/yV_Zp0Mi3xsPopular playlists:Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_dModern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsNStay Curios! Keep Learning!#aws #awslambda #serverless #distributedsystems #scalability #reliability
undefined
Jul 29, 2025 • 1h 5min

Breaking Distributed Systems with Kyle Kingsbury from Jepsen

For memberships: join this channel as a member here:https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/joinSummary:In this episode of The Geek Narrator podcast, host Kaivalya Apte interviews Kyle Kingsbury, a renowned expert in database and distributed systems safety analysis. They discuss the world of testing distributed systems, the challenges faced, common bugs and patterns. Kyle shares insights on the importance of understanding system documentation, the role of formal verification, and the balance between performance and safety in testing. He also provides valuable advice for aspiring engineers in the field of distributed systems.Chapters:00:00 Introduction to Kyle Kingsbury and His Work06:59 Common Bugs in Distributed Systems12:37 Functional Bugs vs Safety Bugs17:54 Changes in Testing Over the Years26:03 False Positives and Negatives in Testing32:33 The Importance of Experimentation in Testing39:28 Tools and Technologies for Testing48:58 The Role of Formal Verification57:04 Reusability of TestsImportant links:Distributed systems class: https://github.com/aphyr/distsys-classWrite your own distributed system: https://github.com/jepsen-io/maelstromJepsen Analyses: https://jepsen.io/analysesKey takeaways:- Reading documentation is a crucial first step in testing systems.- Testing distributed systems involves understanding their semantics and guarantees.- Common bugs often arise from mismanagement of definite versus indefinite failures.- Testing strategies for cloud-based systems require cooperation with providers.- Performance testing can reveal unexpected behaviours in systems under stress.- Formal verification remains a challenging but valuable tool in ensuring system safety.- The testing process is iterative and requires collaboration with engineering teams.- Aspiring engineers should immerse themselves in practical experiences to build intuition.For memberships: join this channel as a member here:https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/joinDon't forget to like, share, and subscribe for more insights!=============================================================================Like building stuff? Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription.https://app.codecrafters.io/join?via=geeknarrator=============================================================================Database internals series: https://youtu.be/yV_Zp0Mi3xsPopular playlists:Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_dModern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsNStay Curios! Keep Learning!#databasearchitecture #distributedsystems #cloudcomputing #testing #jepsen
undefined
9 snips
Apr 7, 2025 • 1h 9min

How do vector (search) databases work? ft: turbopuffer

Simon Eskildsen, Co-founder of TurboPuffer and former infrastructure builder at Shopify, dives into the fascinating world of vector databases. He discusses the transformative role of vector search in enhancing recommendation systems, alongside challenges like cost and scaling. Simon also shares insights on managing podcast episode archives using embeddings and indexing strategies. The conversation highlights the importance of observability in database performance and paints an exciting picture of future trends in vector search technology.
undefined
Apr 7, 2025 • 1h 23min

Are your Data Pipelines Complex?

The GeekNarrator memberships can be joined here: https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/joinMembership will get you access to member only videos, exclusive notes and monthly 1:1 with me. Here you can see all the member only videos: https://www.youtube.com/playlist?list=UUMO_mGuY4g0mggeUGM6V1osdA------------------------------------------------------------------------------------------------------------------------------------------------------------------About this episode: ------------------------------------------------------------------------------------------------------------------------------------------------------------------In this conversation, Jacopo and Ciro discuss their journey in building Bauplan, a platform designed to simplify data management and enhance developer experience. They explore the challenges faced in data bottlenecks, the integration of development and production environments, and the unique approach of Bauplan using serverless functions and Git-like versioning for data. The discussion also touches on scalability, handling large data workloads, and the critical aspects of reproducibility and compliance in data management. Chapters:00:00 Introduction03:00 The Data Bottleneck: Challenges in Data Management06:14 Bridging Development and Production: The Need for Integration09:06 Serverless Functions and Git for Data17:03 Developer Experience: Reducing Complexity in Data Management19:45 The Role of Functions in Data Pipelines: A New Paradigm23:40 Building Robust Data Solutions: Versioning and Parameters30:13 Optimizing Data Processing: Bauplan Runtime46:46 Understanding Control Planes and Data Management48:51 Ensuring Robustness in Data Pipelines52:38 Data Quality and Testing Mechanisms54:43 Branching and Collaboration in Data Development57:09 Scalability and Resource Management in Data Functions01:01:13 Handling Large Data Workloads and Use Cases01:09:05 Reproducibility and Compliance in Data Management01:16:46 Future Directions in Data Engineering and Use CasesLinks and References:Bauplan website:https://www.bauplanlabs.com
undefined
Apr 6, 2025 • 1h 17min

Can Math simplify incremental compute?

In this episode of The Geek Narrator podcast, Lalit Suresh, CEO of Feldera, joins us to share insights on incremental view maintenance and its significance in modern data processing.We have discussed the challenges posed by distributed systems, the mathematical foundation of DBSP, and how Feldera's architecture addresses these challenges. Performance optimization, handling late events, and the future of stream processing, the importance of SQL in creating efficient data workflows - its all in here.Chapters00:00 Introduction to Incremental View Maintenance06:30 Challenges in Distributed Systems11:46 Batch Processing vs Stream Processing16:27 Understanding DBSP: The Mathematical Foundation27:46 Architecture of Feldera and Data Flow39:23 Partitioning and Storage Layer in Feldera42:51 Understanding Co-Design Storage Layers45:52 Foreground and Background Workers in DBSP49:16 Tuning Background Workers for Performance49:41 Synchronous Compute Model and View Propagation51:35 Zsets and Batch Processing in Stream Workloads54:00 Data Model Optimization in Feldera57:22 Handling Late Events and Lateness in Feldera01:01:18 Watermarks and Lateness Annotations01:04:20 Error Handling and Idempotency in Feldera01:11:05 Feldera's Differentiators and Future Roadmap
undefined
11 snips
Mar 14, 2025 • 1h 5min

Redpanda - High Performance Streaming Platform for Data Intensive Applications

Dive into innovative engineering as Alex discusses Red Panda's unique architecture, setting it apart from traditional messaging systems like Kafka. Unravel the complexities of optimizing memory management and latency for high-performance streaming. Explore the benefits of the 'thread per core' design for improved concurrency and reduced latency. Discover the importance of storage protocol correctness and the rigor of formal verification methods. This conversation highlights a future where streamlined data processing meets cutting-edge technology.
undefined
Mar 14, 2025 • 1h

Hosted PostgreSQL on bare metal and uni kernel

The GeekNarrator memberships can be joined here: https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/joinMembership will get you access to member only videos, exclusive notes and monthly 1:1 with me. Here you can see all the member only videos: https://www.youtube.com/playlist?list=UUMO_mGuY4g0mggeUGM6V1osdA------------------------------------------------------------------------------------------------------------------------------------------------------------------About this episode: ------------------------------------------------------------------------------------------------------------------------------------------------------------------In this episode, we talk to Søren Schmidt, Co-Founder and CEO of Prisma, discussing the evolution of Prisma from a backend as a service to a popular ORM and now to Prisma Postgres. He shares insights into the challenges faced during this journey, the importance of user feedback, and the innovative architecture of Prisma Postgres, which leverages micro VMs for performance optimization. The conversation also touches on the complexities of managing data centers and the strategies employed to ensure a seamless user experience. In this conversation, Søren Schmidt discusses the details about Postgres snapshots, their impact on performance, and the mechanisms for fault tolerance. He explains how Pulse change data capture works and how Prisma Postgres simplifies database management for users. Chapters00:00 Introduction to Prisma and Its Evolution03:00 The Journey from ORM to Prisma Postgres06:00 Simplifying Database Management09:01 Understanding Prisma Postgres Architecture12:12 The Role of Accelerate in Query Routing14:51 Optimizing Query Processing with Micro VMs18:12 Maintaining Postgres Integrity in a Micro VM Environment21:07 User Experience and Community Feedback23:57 Challenges of Data Center Management27:09 Cold Starts and Performance Optimization34:30 Understanding Snapshots in Postgres38:55 Snapshot Mechanisms and Fault Tolerance44:09 Change Data Capture with Pulse55:07 Transitioning to Prisma Postgres58:45 Community and Getting Started with Prisma PostgresSome blogs worth checking out:https://www.prisma.io/blog/prisma-postgres-the-future-of-serverless-databaseshttps://www.prisma.io/blog/cloudflare-unikernels-and-bare-metal-life-of-a-prisma-postgres-queryhttps://www.prisma.io/blog/announcing-prisma-postgres-early-accessPrisma Postgres relies heavily on the Unikraft project. There is a good introductory talk here: https://www.youtube.com/watch?v=n4wOyAuNhl0And some very technical papers here: https://unikraft.org/community/papersThe best way to get started with Prisma Postgres is to go straight to https://www.prisma.io/ ------------------------------------------------------------------------------------------------------------------------------------------------------------------Like building real stuff?------------------------------------------------------------------------------------------------------------------------------------------------------------------Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription.https://app.codecrafters.io/join?via=geeknarrator------------Database internals series: https://youtu.be/yV_Zp0Mi3xsPopular playlists:Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_dModern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN
undefined
Mar 14, 2025 • 1h 18min

eBPF and continuous profiling with Frederic

The GeekNarrator memberships can be joined here: https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/joinMembership will get you access to member only videos, exclusive notes and monthly 1:1 with me. Here you can see all the member only videos: https://www.youtube.com/playlist?list=UUMO_mGuY4g0mggeUGM6V1osdA------------------------------------------------------------------------------------------------------------------------------------------------------------------About this episode: ------------------------------------------------------------------------------------------------------------------------------------------------------------------In this episode, Kaivalya Apte and Frederic Branczyk talk about observability, focusing on continuous profiling and the role of eBPF. They discuss the evolution of profiling techniques, the importance of systematic data collection, and the challenges faced in maintaining low overhead while gathering detailed performance metrics.Frederic shares insights from his extensive experience with Prometheus and Kubernetes, emphasizing the transformative impact of continuous profiling on software performance optimization. This conversation delves into the intricacies of eBPF (Extended Berkeley Packet Filter) and its applications in profiling and performance analysis. The discussion covers the capabilities of eBPF in extending the kernel safely, the mechanisms of user space profiling, and the handling of process terminations. It also explores memory and network profiling techniques, the challenges of profiling in different programming environments, and the limitations of eBPF in certain use cases. The conversation concludes with valuable resources for those interested in learning more about eBPF and profiling techniques.Chapters:00:00 Introduction to Observability and Profiling01:17 Frederic's Background and Expertise02:11 The Importance of Continuous Profiling06:46 The Value of Continuous Profiling11:20 Understanding Profiling Data19:09 Data Structures and Performance in Profiling32:35 The Role of eBPF in Profiling42:48 Introduction to eBPF and Its Capabilities48:32 User Space Profiling and Memory Management51:39 Handling Process Termination and Agent Recovery55:27 Memory and Network Profiling Techniques01:01:33 Profiling in Different Programming Environments01:11:47 Use Cases and Limitations of eBPF in Profiling01:13:54 Resources for Learning eBPF and Profiling Techniques------------------------------------------------------------------------------------------------------------------------------------------------------------------Like building real stuff?------------------------------------------------------------------------------------------------------------------------------------------------------------------Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription.https://app.codecrafters.io/join?via=geeknarrator------------------------------------------------------------------------------------------------------------------------------------------------------------------Link to other playlists. LIKE, SHARE and SUBSCRIBE------------------------------------------------------------------------------------------------------------------------------------------------------------------Database internals series: https://youtu.be/yV_Zp0Mi3xsPopular playlists:Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_dModern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsNStay Curios! Keep Learning!

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app