Data Engineering Podcast

Troubleshooting Kafka In Production

Dec 24, 2023

Elad Eldor, author of 'Kafka: Troubleshooting in Production', discusses the challenges of operating Kafka at scale and ways to mitigate potential issues. Topics include the importance of Kafka in the data pipeline, doubling retention in Kafka, managed vs. self-managed Kafka clusters, data lake complexity, monitoring for Kafka, troubleshooting unreplicated partitions, the cost of running Kafka in the cloud, and the need for a correlation tool.

Ask episode

Chapters

Transcript

Episode notes

Journey into Working with Kafka

03:29 • 17min

Motivation for writing a book about troubleshooting Kafka in production

Managed Kafka or Self-Managed: Considerations and Parameters

Data Lake Complexity, Introduction to Starburst, and Mitigating Data Loss

Importance of Storage and Monitoring for Kafka

Understanding and Troubleshooting Unreplicated Partitions in Kafka

41:32 • 29min

The Cost of Running Kafka in the Cloud and the Need for Correlation Tool

01:10:08 • 5min