Data Engineering Podcast

Tobias Macey

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Episodes

Mentioned books

Dec 24, 2023 • 1h 15min

Troubleshooting Kafka In Production

Elad Eldor, author of 'Kafka: Troubleshooting in Production', discusses the challenges of operating Kafka at scale and ways to mitigate potential issues. Topics include the importance of Kafka in the data pipeline, doubling retention in Kafka, managed vs. self-managed Kafka clusters, data lake complexity, monitoring for Kafka, troubleshooting unreplicated partitions, the cost of running Kafka in the cloud, and the need for a correlation tool.

Dec 18, 2023 • 56min

Adding An Easy Mode For The Modern Data Stack With 5X

The podcast discusses the challenges of the modern data stack and how 5X is pre-integrating the best tools from each layer to solve these issues. It explores the need for a centralized control plane, strategic investments in data capabilities, and the benefits of consolidating to a single solution. The speaker also shares insights on platform building, simplifying user experience, and lessons learned in introducing a new product category.

Dec 11, 2023 • 51min

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

The podcast discusses the Anomstack project and its goal of providing simple anomaly detection. They explore the definition and prioritization of metrics, implementation and optimization of AMSTAC, and extending the project's capabilities. They also touch on data lakes, Starburst analytics, and challenges in data management.

Dec 4, 2023 • 1h 4min

Designing Data Transfer Systems That Scale

In this podcast, Andrei Tserakhau discusses the challenges and solutions in data transfer and connectors, handling incremental data transfer and deduplication, building a user-friendly data transfer service, building scalable data engineering systems, building a data transfer product with change data capture and air-bike connectors, and challenges and gaps in data management tooling and technology.

14 snips

Nov 27, 2023 • 30min

Addressing The Challenges Of Component Integration In Data Platform Architectures

In this podcast, the host discusses the challenges of integrating components in data platform architectures, including user experience, data sharing and delivery, and shadow IT. They explore event-driven pipelines, access control, data flow ownership, and metadata propagation. The importance of reliable integrations and extensible systems is emphasized, along with tools like Open Lineage and DBT. Python and open metadata platforms are highlighted for simplifying integration and managing permissions and roles across data tools.

53 snips

Nov 20, 2023 • 1h 16min

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Data Engineering Podcast

Episodes

Mentioned books

Troubleshooting Kafka In Production

Adding An Easy Mode For The Modern Data Stack With 5X

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Designing Data Transfer Systems That Scale

Addressing The Challenges Of Component Integration In Data Platform Architectures

Unlocking Your dbt Projects With Practical Advice For Practitioners

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Shining Some Light In The Black Box Of PostgreSQL Performance

Surveying The Market Of Database Products

Defining A Strategy For Your Data Products

The AI-powered Podcast Player