The GeekNarrator cover image

The GeekNarrator

Latest episodes

undefined
Jan 20, 2024 • 1h 3min

VictoriaMetrics internals - Making monitoring simple and reliable at massive scale

Join the insightful discussion with creators Alex and Roman on VictoriaMetrics, a highly scalable monitoring solution and time series database. Explore its origins, evolution, unique architecture, data ingestion, and integration. Learn about the Vector Metric architecture, the role of object storage, and the importance of indexing. Discover the process of data ingestion and selection, and explore future plans for VictoriaMetrics.
undefined
Jan 20, 2024 • 55min

TiDB Internals with Li Shen

Join us on a deep dive into the intricacies of TiDB with Li Shen from PingCap. In this episode, Li Shen provides a comprehensive exploration of TiDB, its unique features, and how it tackles scalability and reliability issues commonly associated with MySQL. If you're dealing with struggles in your MySQL cluster and seeking a more dependable and scalable system, TiDB might be the solution for you. This conversation touches on various aspects of this cutting-edge database, its operational mechanism, use case scenarios, and how it's optimized for different workloads. Key topics include: the architecture of TiDB, the journey of data from API to storage node, embracing analytical use cases, the importance of database reliability, and the process of migrating to TiDB. Dive in now! 00:00 Introduction and Welcome 02:47 Defining TIDB: A Disputed SQL Database 04:55 The Role of MySQL Compatibility in TIDB 05:54 Primary Use Cases for TIDB 09:38 Understanding the Data Ingestion Process in TIDB 16:52 Understanding Indexing in TIDB 23:01 Pushing Down Table Scans and Partial Aggregation 24:39 Introduction to Columnary Extension: Flash 24:54 Understanding Data Replication and Learner Nodes 26:23 Ensuring Strong Consistency in Data 27:12 Balancing Transactional and Analytical Use Cases 27:57 Understanding Data Replication and Consistency Model 28:42 Exploring Ty Flash Storage Layer 28:54 Understanding High Concurrency Insert and Update 32:09 Exploring the Read Path and Caching Mechanism 37:50 Understanding the Importance of High Reliability 43:01 Exploring Migration from Other Databases 48:01 Comparing TiDB with Other Distributed SQL Databases 52:21 Identifying Use Cases Where TiDB Might Not Be the Best Choice Stay Curios! Keep Learning!
undefined
Jan 14, 2024 • 1h 15min

AI Powered Database optimisation with Andy Pavlo, Ottertune

In this video I discuss Database tuning and Optimisation with Andy Pavlo, OtterTune. Andy is an Associate Professor with Indefinite Tenure of Databaseology in the Computer Science Department at Carnegie Mellon University. My research interest is in database management systems, specifically main memory systems, self-driving / autonomous architectures, transaction processing systems, and large-scale data analytics. 00:00 Introduction and Welcome 01:31 Understanding Database Optimization 05:48 Understanding When Database Tuning is Needed 08:45 Understanding Database Optimization Difficulties 16:16 Understanding Default Settings in Databases 22:35 Role of Machine Learning in Database Tuning 22:38 Introduction to Ottertune 28:36 Data Collection for Machine Learning Model 35:25 Deployment and Data Collection Process 38:03 Admitting the Limitations of Current Model 38:53 Challenges in Predicting Performance Improvements 39:28 The Importance of Data Collection Over Time 39:52 Avoiding Weekend and Holiday Tuning 40:05 Introducing New Features for Database Comparison 42:09 Provisioning Recommendations and Performance Predictions 43:03 The Importance of Telemetry in Understanding Database Performance 44:01 Handling Dramatic Changes in Database Workloads 44:48 Preparing for Predictable Traffic Spikes 48:13 The Importance of Testing in Database Optimization 53:33 The Future of Database Optimization 55:50 Common Mistakes in Database Management 01:09:15 The Future of Holistic Database Tuning Links: Ottertune: https://ottertune.com/ Andy Pavlo: https://www.cs.cmu.edu/~pavlo/ CMU youtube: https://www.youtube.com/@UCHnBsf2rH-K7pn09rb3qvkA Resources: CMU: https://15799.courses.cs.cmu.edu/spring2022/schedule.html Ottertune blog: https://ottertune.com/blog =============================================================================== For discount on the below courses: Appsync: https://appsyncmasterclass.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Testing serverless: https://testserverlessapps.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Production-Ready Serverless: https://productionreadyserverless.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Use the button, Add Discount and enter "geeknarrator" discount code to get 20% discount. =============================================================================== Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Stay Curios! Keep Learning!
undefined
Dec 6, 2023 • 1h 4min

Duckdb Internals with Mark Raasveldt

Deep Dive into DuckDB with CTO Mark Raasveldt Decode the insights of databases with Geek Narrator podcast. In this episode, host Kaivalya Apte converses with Mark Raasveldt, the CTO of DuckDB labs, discussing his journey from being a database enthusiast to creating DuckDB. They delve into how DuckDB, an analytical database, differs from other databases, the design decisions, its internal mechanisms, and much more. The episode also highlights the advantages of DuckDB in analytics, the motivation behind its ACID compliance, and how DuckDB handles ingestion, transaction isolation, mutations, and queries. Join in to learn how your data workloads can benefit from DuckDB. 00:00 Introduction and Guest Introduction 00:44 Guest's Journey into Databases 03:40 The Birth of DuckDB 04:30 Challenges with Existing Databases 05:15 Technical Difficulties 05:16 Why Existing Databases Fall Short for Data Scientists 09:16 The Role of SQLite and Its Limitations 13:59 Defining DuckDB 16:48 Comparing DuckDB with Other Analytical Databases 19:50 Deployment Models for DuckDB 22:47 Data ingestion into DuckDB 22:51 Data Ingestion in DuckDB 30:24 How DuckDB Handles Updates and Mutations 35:35 Understanding Column Granularity and Rewrites 35:58 Implications of Compression on Data Updates 36:38 Trade-offs in Row Group Size 37:32 Benefits of Column Storage Model 38:15 Row Groups and Parallelism 39:02 Choosing Row Group Size: An Experimental Approach 40:00 Handling Data Type Changes in Columns 41:00 Internal Data Structures in DuckDB 42:21 Reading Data: Point Lookups, Aggregations, and Joins 47:22 Optimization for Full Table Scans 53:49 Understanding ACID Compliance in DuckDB 55:49 Multi-Version Concurrency Control (MVCC) in DuckDB 59:50 Use Cases and Applications of DuckDB 01:01:42 The Story Behind DuckDB's Name 01:02:34 Future Vision for DuckDB References: DuckDB: https://duckdb.org/ Mark's blog: https://mytherin.github.io/ =============================================================================== For discount on the below courses: Appsync: https://appsyncmasterclass.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Testing serverless: https://testserverlessapps.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Production-Ready Serverless: https://productionreadyserverless.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Use the button, Add Discount and enter "geeknarrator" discount code to get 20% discount. =============================================================================== Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! Cheers, The GeekNarrator
undefined
Nov 25, 2023 • 55min

ScyllaDB internals with Felipe Mendes

In this episode we talk about ScyllaDB internals with Felipe Mendes. Chapters: 0:00 ScyllaDB internals with Felipe Mendes 07:51 Write Path - API to Storage 11:40 What makes it faster than Cassandra? 13:39 Optimisations: Sea Star, shard per core architecture 15:49 Optimisations: No Garbage collection and Custom Cache Implementation 18:15 Optimisations: Scheduling groups and IO priority classes 20:07 Optimisations: IO scheduler 22:55 Benefits of shard per core architecture 30:16 Write path - Hows is a coordinator chosen? 38:20 Read path 39:27 Read path optimisations - Index Caching 41:48 Shard vs Partition 43:10 Shard per core architecture tradeoff 44:03 Observability of Database References: ScyllaDB architecture: https://opensource.docs.scylladb.com/stable/architecture/ Sea star: https://seastar.io/ ScyllaDB Caching: https://www.scylladb.com/2018/07/26/how-scylla-data-cache-works/ Shard per core architecture: https://www.scylladb.com/product/technology/shard-per-core-architecture/ Database performance at Scale: https://www.scylladb.com/2023/10/02/introducing-database-performance-at-scale-a-free-open-source-book/ =============================================================================== For discount on the below courses: Appsync: https://appsyncmasterclass.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Testing serverless: https://testserverlessapps.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Production-Ready Serverless: https://productionreadyserverless.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Use the button, Add Discount and enter "geeknarrator" discount code to get 20% discount. =============================================================================== Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! Cheers, The GeekNarrator
undefined
Nov 9, 2023 • 1h 9min

Graph Database Internals: @neo4j with Michael Hunger

In this episode I talk to Michael Hunger from Neo4j about Graph Database Internals (Neo4J) Chapters: 0:00 Introduction and historical context 20:51 Data Modelling 25:16 Problem with SQL for Graph Model 26:21 Cypher - Query Language 28:23 Write Path 31:36 Neo4J Storage Layer 33:51 Graph API on top of Relational Model vs Native Graph Databases 37:05 Create Node Relationships 40:42 What makes Graph Database's performance better? 46:00 Partitioning Strategy 53:20 Read path 59:27 Schema Migration 01:04:41 Graph database use cases =============================================================================== For discount on the below courses: Appsync: https://appsyncmasterclass.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Testing serverless: https://testserverlessapps.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Production-Ready Serverless: https://productionreadyserverless.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Use the button, Add Discount and enter "geeknarrator" discount code to get 20% discount. =============================================================================== Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! Cheers, The GeekNarrator
undefined
Oct 21, 2023 • 59min

RUST vs C++, Java, Go with Micah Wylde

In this episode I talk to Micah Wylde about why #Rust could be the best choice for writing distributed systems and how does it compare to #C++, #Java and #Go. Chapters: 00:00 Introduction 03:48 History of Systems Programming 09:42 Is C++ coming back? 13:31 Problems with C++ 16:24 Problems with Java 25:18 Problems with Go 31:21 Why did you choose Rust? 35:19 What makes Rust better? 41:49 Rust cannot save you from logical bugs 44:02 Problems in the context of Stream Processing 48:10 Challenges with Rust 51:28 Learning Rust 54:10 Future of Rust 56:41 A Summary Blog mentioned in the discussion: https://www.arroyo.dev/blog/rust-for-data-infra For the courses mentioned use the following links: Coupon code: "geeknarrator" Appsync: https://appsyncmasterclass.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Testing serverless: https://testserverlessapps.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Production-Ready Serverless: https://productionreadyserverless.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Use the button, Add Discount and enter "geeknarrator" discount code to get 20% discount. Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! Cheers, The GeekNarrator
undefined
Oct 15, 2023 • 34min

Becoming a better engineer - John Crickett

Hello Everyone, In this podcast I have invited John Crickett, who has been a Software Engineer since 27 years, having vast experience in variety of tech stacks. He is known for his newsletter "Coding Challenges" that helps developers build real world applications and becomming a better engineer. 00:00 Introduction 01:17 What made you start Coding Challenges? 03:21 What made you start learning Rust? 04:08 How should Software Engineers Prioritise learning? What should they learn? How would they know? 12:20 How to become a better engineer? 14:05 Knowing your passion? but how? 17:43 Should LeetCode be part of interviews? When does (and not) it make sense ? 25:39 System Design interviews 29:38 Building as a community. More about Coding Challenges : https://codingchallenges.fyi Join the discord server: https://discord.com/invite/zv4RKDcEKV Connect with John : https://www.linkedin.com/in/johncrickett/ Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Cheers, The GeekNarrator
undefined
Oct 5, 2023 • 1h 8min

YugaByteDB Internals with Franck Pachot

Hey Everyone, In this video I talk to Franck Pachot about internals of YugabyteDB. Franck has joined the show previously to talk about general database internals and its again a pleasure to host him and talk about DistributedSQL, YugabyteDB, ACID properties, PostgreSQL compatibility etc. Chapters: 00:00 Introduction 01:26 What does Cloud Native means? 02:57 What is Distributed SQL? 03:47 Is DistributedSQL also based on Sharding? 05:44 What problem does DistributedSQL solves? 07:32 Writes - Behind the scenes. 10:59 Reads: Behind the scenes. 17:01 BTrees vs LSM: How is the data written do disc? 25:02 Why RocksDB? 29:52 How is data stored? Key Value? 33:56 Transactions: Complexity, SQL vs NoSQL 42:51 MVCC in YugabyteDB: How does it work? 45:08 Default Transaction Isolation level in YugabyteDB 51:57 Fault Tolerance & High Availability in Yugabyte 56:48 Thoughts on Postgres Compatibility and Future of Distributed SQL 01:03:53 Usecases not suitable for YugabyteDB Previous videos: Database Internals: Part1: https://youtu.be/DiLA0Ri6RfY?si=ToGv9NwjdyDE4LHO Part2: https://youtu.be/IW4cpnpVg7E?si=ep2Yb-j_eaWxvRwc Geo Distributed Applications: https://youtu.be/JQfnMp0OeTA?si=Rf2Y36-gnpQl18yj Postgres Compatibility: https://youtu.be/2dtu_Ki9TQY?si=rcUk4tiBmlsFPYzY I hope you liked this episode, please hit the like button and subscribe to the channel for more. Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Franck's Twitter and Linkedin: https://twitter.com/FranckPachot and https://www.linkedin.com/in/franckpachot/ Connect and follow here: https://twitter.com/thegeeknarrator and https://www.linkedin.com/in/kaivalyaapte/ Keep learning and growing. Cheers, The GeekNarrator
undefined
Aug 23, 2023 • 44min

Accelerating Postgres Queries with Epsio - GIlad Kleinman

Hey Everyone, In this video I talk to Gilad Kleinmann, CEO and Co-Founder of epsio.io, about Epsio and how it helps companies to run queries faster and cheaper. Chapters: 00:00 Introduction 02:09 Defining the problem statement 07:17 What is Epsio ? 09:58 How does Epsio change my architecture? 12:59 Use of CDC 14:05 Where is the query result stored ? (Foreign data wrappers) 15:40 What permissions does Epsio needs? 16:43 How does Epsio parses a query and creates a virtual table? 24:15 Consistency model of Epsio 27:48 How do I know if Epsio is suitable for me? 31:41 How does it compare with Caching? 35:59 What metrics are available with Epsio? 38:32 What other databases does Epsio support? (will support) 40:47 How to know more about Epsio? 41:37 Pricing model of Epsio Read more about epsio: https://www.epsio.io/ Docs: https://docs.epsio.io/ Foreign data wrappers: https://wiki.postgresql.org/wiki/Foreign_data_wrappers Other playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN I hope you like this episode, please hit the like button if you did and subscribe to the channel if you haven't. Cheers, The GeekNarrator

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner