
The GeekNarrator
The GeekNarrator podcast is a show hosted by Kaivalya Apte who is a Software Engineer and loves to talk about Technology, Technical Interviews, Self Improvement, Best Practices and Hustle.
Connect with Kaivalya Apte https://www.linkedin.com/in/kaivalya-apte-2217221a
Tech blogs: https://kaivalya-apte.medium.com/
Wanna talk? Book a slot here: https://calendly.com/speakwithkv/hey
Enjoy the show and please follow to get more updates. Also please don’t forget to rate and review the show.
Cheers
Latest episodes

Jan 20, 2024 • 1h 3min
VictoriaMetrics internals - Making monitoring simple and reliable at massive scale
Join the insightful discussion with creators Alex and Roman on VictoriaMetrics, a highly scalable monitoring solution and time series database. Explore its origins, evolution, unique architecture, data ingestion, and integration. Learn about the Vector Metric architecture, the role of object storage, and the importance of indexing. Discover the process of data ingestion and selection, and explore future plans for VictoriaMetrics.

Jan 20, 2024 • 55min
TiDB Internals with Li Shen
Join us on a deep dive into the intricacies of TiDB with Li Shen from PingCap. In this episode, Li Shen provides a comprehensive exploration of TiDB, its unique features, and how it tackles scalability and reliability issues commonly associated with MySQL.
If you're dealing with struggles in your MySQL cluster and seeking a more dependable and scalable system, TiDB might be the solution for you. This conversation touches on various aspects of this cutting-edge database, its operational mechanism, use case scenarios, and how it's optimized for different workloads.
Key topics include: the architecture of TiDB, the journey of data from API to storage node, embracing analytical use cases, the importance of database reliability, and the process of migrating to TiDB. Dive in now!
00:00 Introduction and Welcome
02:47 Defining TIDB: A Disputed SQL Database
04:55 The Role of MySQL Compatibility in TIDB
05:54 Primary Use Cases for TIDB
09:38 Understanding the Data Ingestion Process in TIDB
16:52 Understanding Indexing in TIDB
23:01 Pushing Down Table Scans and Partial Aggregation
24:39 Introduction to Columnary Extension: Flash
24:54 Understanding Data Replication and Learner Nodes
26:23 Ensuring Strong Consistency in Data
27:12 Balancing Transactional and Analytical Use Cases
27:57 Understanding Data Replication and Consistency Model
28:42 Exploring Ty Flash Storage Layer
28:54 Understanding High Concurrency Insert and Update
32:09 Exploring the Read Path and Caching Mechanism
37:50 Understanding the Importance of High Reliability
43:01 Exploring Migration from Other Databases
48:01 Comparing TiDB with Other Distributed SQL Databases
52:21 Identifying Use Cases Where TiDB Might Not Be the Best Choice
Stay Curios! Keep Learning!

Jan 14, 2024 • 1h 15min
AI Powered Database optimisation with Andy Pavlo, Ottertune
In this video I discuss Database tuning and Optimisation with Andy Pavlo, OtterTune.
Andy is an Associate Professor with Indefinite Tenure of Databaseology in the Computer Science Department at Carnegie Mellon University. My research interest is in database management systems, specifically main memory systems, self-driving / autonomous architectures, transaction processing systems, and large-scale data analytics.
00:00 Introduction and Welcome
01:31 Understanding Database Optimization
05:48 Understanding When Database Tuning is Needed
08:45 Understanding Database Optimization Difficulties
16:16 Understanding Default Settings in Databases
22:35 Role of Machine Learning in Database Tuning
22:38 Introduction to Ottertune
28:36 Data Collection for Machine Learning Model
35:25 Deployment and Data Collection Process
38:03 Admitting the Limitations of Current Model
38:53 Challenges in Predicting Performance Improvements
39:28 The Importance of Data Collection Over Time
39:52 Avoiding Weekend and Holiday Tuning
40:05 Introducing New Features for Database Comparison
42:09 Provisioning Recommendations and Performance Predictions
43:03 The Importance of Telemetry in Understanding Database Performance
44:01 Handling Dramatic Changes in Database Workloads
44:48 Preparing for Predictable Traffic Spikes
48:13 The Importance of Testing in Database Optimization
53:33 The Future of Database Optimization
55:50 Common Mistakes in Database Management
01:09:15 The Future of Holistic Database Tuning
Links:
Ottertune: https://ottertune.com/
Andy Pavlo: https://www.cs.cmu.edu/~pavlo/
CMU youtube: https://www.youtube.com/@UCHnBsf2rH-K7pn09rb3qvkA
Resources:
CMU: https://15799.courses.cs.cmu.edu/spring2022/schedule.html
Ottertune blog: https://ottertune.com/blog
===============================================================================
For discount on the below courses:
Appsync: https://appsyncmasterclass.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Testing serverless: https://testserverlessapps.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Production-Ready Serverless: https://productionreadyserverless.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Use the button, Add Discount and enter "geeknarrator" discount code to get 20% discount.
===============================================================================
Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator
If you like this episode, please hit the like button and share it with your network.
Also please subscribe if you haven't yet.
Stay Curios! Keep Learning!

Dec 6, 2023 • 1h 4min
Duckdb Internals with Mark Raasveldt
Deep Dive into DuckDB with CTO Mark Raasveldt
Decode the insights of databases with Geek Narrator podcast. In this episode, host Kaivalya Apte converses with Mark Raasveldt, the CTO of DuckDB labs, discussing his journey from being a database enthusiast to creating DuckDB. They delve into how DuckDB, an analytical database, differs from other databases, the design decisions, its internal mechanisms, and much more. The episode also highlights the advantages of DuckDB in analytics, the motivation behind its ACID compliance, and how DuckDB handles ingestion, transaction isolation, mutations, and queries. Join in to learn how your data workloads can benefit from DuckDB.
00:00 Introduction and Guest Introduction
00:44 Guest's Journey into Databases
03:40 The Birth of DuckDB
04:30 Challenges with Existing Databases
05:15 Technical Difficulties
05:16 Why Existing Databases Fall Short for Data Scientists
09:16 The Role of SQLite and Its Limitations
13:59 Defining DuckDB
16:48 Comparing DuckDB with Other Analytical Databases
19:50 Deployment Models for DuckDB
22:47 Data ingestion into DuckDB
22:51 Data Ingestion in DuckDB
30:24 How DuckDB Handles Updates and Mutations
35:35 Understanding Column Granularity and Rewrites
35:58 Implications of Compression on Data Updates
36:38 Trade-offs in Row Group Size
37:32 Benefits of Column Storage Model
38:15 Row Groups and Parallelism
39:02 Choosing Row Group Size: An Experimental Approach
40:00 Handling Data Type Changes in Columns
41:00 Internal Data Structures in DuckDB
42:21 Reading Data: Point Lookups, Aggregations, and Joins
47:22 Optimization for Full Table Scans
53:49 Understanding ACID Compliance in DuckDB
55:49 Multi-Version Concurrency Control (MVCC) in DuckDB
59:50 Use Cases and Applications of DuckDB
01:01:42 The Story Behind DuckDB's Name
01:02:34 Future Vision for DuckDB
References:
DuckDB: https://duckdb.org/
Mark's blog: https://mytherin.github.io/
===============================================================================
For discount on the below courses:
Appsync: https://appsyncmasterclass.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Testing serverless: https://testserverlessapps.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Production-Ready Serverless: https://productionreadyserverless.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Use the button, Add Discount and enter "geeknarrator" discount code to get 20% discount.
===============================================================================
Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator
If you like this episode, please hit the like button and share it with your network.
Also please subscribe if you haven't yet.
Database internals series: https://youtu.be/yV_Zp0Mi3xs
Popular playlists:
Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-
Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17
Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d
Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN
Stay Curios! Keep Learning!
Cheers,
The GeekNarrator

Nov 25, 2023 • 55min
ScyllaDB internals with Felipe Mendes
In this episode we talk about ScyllaDB internals with Felipe Mendes.
Chapters:
0:00 ScyllaDB internals with Felipe Mendes
07:51 Write Path - API to Storage
11:40 What makes it faster than Cassandra?
13:39 Optimisations: Sea Star, shard per core architecture
15:49 Optimisations: No Garbage collection and Custom Cache Implementation
18:15 Optimisations: Scheduling groups and IO priority classes
20:07 Optimisations: IO scheduler
22:55 Benefits of shard per core architecture
30:16 Write path - Hows is a coordinator chosen?
38:20 Read path
39:27 Read path optimisations - Index Caching
41:48 Shard vs Partition
43:10 Shard per core architecture tradeoff
44:03 Observability of Database
References:
ScyllaDB architecture: https://opensource.docs.scylladb.com/stable/architecture/
Sea star: https://seastar.io/
ScyllaDB Caching: https://www.scylladb.com/2018/07/26/how-scylla-data-cache-works/
Shard per core architecture: https://www.scylladb.com/product/technology/shard-per-core-architecture/
Database performance at Scale: https://www.scylladb.com/2023/10/02/introducing-database-performance-at-scale-a-free-open-source-book/
===============================================================================
For discount on the below courses:
Appsync: https://appsyncmasterclass.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Testing serverless: https://testserverlessapps.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Production-Ready Serverless: https://productionreadyserverless.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Use the button, Add Discount and enter "geeknarrator" discount code to get 20% discount.
===============================================================================
Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator
If you like this episode, please hit the like button and share it with your network.
Also please subscribe if you haven't yet.
Database internals series: https://youtu.be/yV_Zp0Mi3xs
Popular playlists:
Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-
Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17
Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d
Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN
Stay Curios! Keep Learning!
Cheers,
The GeekNarrator

Nov 9, 2023 • 1h 9min
Graph Database Internals: @neo4j with Michael Hunger
In this episode I talk to Michael Hunger from Neo4j about Graph Database Internals (Neo4J)
Chapters:
0:00 Introduction and historical context
20:51 Data Modelling
25:16 Problem with SQL for Graph Model
26:21 Cypher - Query Language
28:23 Write Path
31:36 Neo4J Storage Layer
33:51 Graph API on top of Relational Model vs Native Graph Databases
37:05 Create Node Relationships
40:42 What makes Graph Database's performance better?
46:00 Partitioning Strategy
53:20 Read path
59:27 Schema Migration
01:04:41 Graph database use cases
===============================================================================
For discount on the below courses:
Appsync: https://appsyncmasterclass.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Testing serverless: https://testserverlessapps.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Production-Ready Serverless: https://productionreadyserverless.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Use the button, Add Discount and enter "geeknarrator" discount code to get 20% discount.
===============================================================================
Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator
If you like this episode, please hit the like button and share it with your network.
Also please subscribe if you haven't yet.
Database internals series: https://youtu.be/yV_Zp0Mi3xs
Popular playlists:
Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-
Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17
Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d
Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN
Stay Curios! Keep Learning!
Cheers,
The GeekNarrator

Oct 21, 2023 • 59min
RUST vs C++, Java, Go with Micah Wylde
In this episode I talk to Micah Wylde about why #Rust could be the best choice for writing distributed systems and how does it compare to #C++, #Java and #Go.
Chapters:
00:00 Introduction
03:48 History of Systems Programming
09:42 Is C++ coming back?
13:31 Problems with C++
16:24 Problems with Java
25:18 Problems with Go
31:21 Why did you choose Rust?
35:19 What makes Rust better?
41:49 Rust cannot save you from logical bugs
44:02 Problems in the context of Stream Processing
48:10 Challenges with Rust
51:28 Learning Rust
54:10 Future of Rust
56:41 A Summary
Blog mentioned in the discussion: https://www.arroyo.dev/blog/rust-for-data-infra
For the courses mentioned use the following links:
Coupon code: "geeknarrator"
Appsync: https://appsyncmasterclass.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Testing serverless: https://testserverlessapps.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Production-Ready Serverless: https://productionreadyserverless.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003
Use the button, Add Discount and enter "geeknarrator" discount code to get 20% discount.
Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator
If you like this episode, please hit the like button and share it with your network.
Also please subscribe if you haven't yet.
Database internals series: https://youtu.be/yV_Zp0Mi3xs
Popular playlists:
Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-
Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17
Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d
Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN
Stay Curios! Keep Learning!
Cheers,
The GeekNarrator

Oct 15, 2023 • 34min
Becoming a better engineer - John Crickett
Hello Everyone,
In this podcast I have invited John Crickett, who has been a Software Engineer since 27 years, having vast experience in variety of tech stacks. He is known for his newsletter "Coding Challenges" that helps developers build real world applications and becomming a better engineer.
00:00 Introduction
01:17 What made you start Coding Challenges?
03:21 What made you start learning Rust?
04:08 How should Software Engineers Prioritise learning? What should they learn? How would they know?
12:20 How to become a better engineer?
14:05 Knowing your passion? but how?
17:43 Should LeetCode be part of interviews? When does (and not) it make sense ?
25:39 System Design interviews
29:38 Building as a community.
More about Coding Challenges : https://codingchallenges.fyi
Join the discord server: https://discord.com/invite/zv4RKDcEKV
Connect with John : https://www.linkedin.com/in/johncrickett/
Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator
If you like this episode, please hit the like button and share it with your network.
Cheers,
The GeekNarrator

Oct 5, 2023 • 1h 8min
YugaByteDB Internals with Franck Pachot
Hey Everyone,
In this video I talk to Franck Pachot about internals of YugabyteDB. Franck has joined the show previously to talk about general database internals and its again a pleasure to host him and talk about DistributedSQL, YugabyteDB, ACID properties, PostgreSQL compatibility etc.
Chapters:
00:00 Introduction
01:26 What does Cloud Native means?
02:57 What is Distributed SQL?
03:47 Is DistributedSQL also based on Sharding?
05:44 What problem does DistributedSQL solves?
07:32 Writes - Behind the scenes.
10:59 Reads: Behind the scenes.
17:01 BTrees vs LSM: How is the data written do disc?
25:02 Why RocksDB?
29:52 How is data stored? Key Value?
33:56 Transactions: Complexity, SQL vs NoSQL
42:51 MVCC in YugabyteDB: How does it work?
45:08 Default Transaction Isolation level in YugabyteDB
51:57 Fault Tolerance & High Availability in Yugabyte
56:48 Thoughts on Postgres Compatibility and Future of Distributed SQL
01:03:53 Usecases not suitable for YugabyteDB
Previous videos:
Database Internals:
Part1: https://youtu.be/DiLA0Ri6RfY?si=ToGv9NwjdyDE4LHO
Part2: https://youtu.be/IW4cpnpVg7E?si=ep2Yb-j_eaWxvRwc
Geo Distributed Applications: https://youtu.be/JQfnMp0OeTA?si=Rf2Y36-gnpQl18yj
Postgres Compatibility: https://youtu.be/2dtu_Ki9TQY?si=rcUk4tiBmlsFPYzY
I hope you liked this episode, please hit the like button and subscribe to the channel for more.
Popular playlists:
Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-
Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17
Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d
Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN
Franck's Twitter and Linkedin: https://twitter.com/FranckPachot and https://www.linkedin.com/in/franckpachot/
Connect and follow here: https://twitter.com/thegeeknarrator and https://www.linkedin.com/in/kaivalyaapte/
Keep learning and growing.
Cheers,
The GeekNarrator

Aug 23, 2023 • 44min
Accelerating Postgres Queries with Epsio - GIlad Kleinman
Hey Everyone, In this video I talk to Gilad Kleinmann, CEO and Co-Founder of epsio.io, about Epsio and how it helps companies to run queries faster and cheaper.
Chapters:
00:00 Introduction
02:09 Defining the problem statement
07:17 What is Epsio ?
09:58 How does Epsio change my architecture?
12:59 Use of CDC
14:05 Where is the query result stored ? (Foreign data wrappers)
15:40 What permissions does Epsio needs?
16:43 How does Epsio parses a query and creates a virtual table?
24:15 Consistency model of Epsio
27:48 How do I know if Epsio is suitable for me?
31:41 How does it compare with Caching?
35:59 What metrics are available with Epsio?
38:32 What other databases does Epsio support? (will support)
40:47 How to know more about Epsio?
41:37 Pricing model of Epsio
Read more about epsio: https://www.epsio.io/
Docs: https://docs.epsio.io/
Foreign data wrappers: https://wiki.postgresql.org/wiki/Foreign_data_wrappers
Other playlists:
Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-
Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17
Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d
Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN
I hope you like this episode, please hit the like button if you did and subscribe to the channel if you haven't.
Cheers,
The GeekNarrator
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.