

Data Engineering Podcast
Tobias Macey
This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
Episodes
Mentioned books

Dec 24, 2023 • 1h 15min
Troubleshooting Kafka In Production
Elad Eldor, author of 'Kafka: Troubleshooting in Production', discusses the challenges of operating Kafka at scale and ways to mitigate potential issues. Topics include the importance of Kafka in the data pipeline, doubling retention in Kafka, managed vs. self-managed Kafka clusters, data lake complexity, monitoring for Kafka, troubleshooting unreplicated partitions, the cost of running Kafka in the cloud, and the need for a correlation tool.

Dec 18, 2023 • 56min
Adding An Easy Mode For The Modern Data Stack With 5X
The podcast discusses the challenges of the modern data stack and how 5X is pre-integrating the best tools from each layer to solve these issues. It explores the need for a centralized control plane, strategic investments in data capabilities, and the benefits of consolidating to a single solution. The speaker also shares insights on platform building, simplifying user experience, and lessons learned in introducing a new product category.

Dec 11, 2023 • 51min
Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack
The podcast discusses the Anomstack project and its goal of providing simple anomaly detection. They explore the definition and prioritization of metrics, implementation and optimization of AMSTAC, and extending the project's capabilities. They also touch on data lakes, Starburst analytics, and challenges in data management.

Dec 4, 2023 • 1h 4min
Designing Data Transfer Systems That Scale
In this podcast, Andrei Tserakhau discusses the challenges and solutions in data transfer and connectors, handling incremental data transfer and deduplication, building a user-friendly data transfer service, building scalable data engineering systems, building a data transfer product with change data capture and air-bike connectors, and challenges and gaps in data management tooling and technology.

14 snips
Nov 27, 2023 • 30min
Addressing The Challenges Of Component Integration In Data Platform Architectures
In this podcast, the host discusses the challenges of integrating components in data platform architectures, including user experience, data sharing and delivery, and shadow IT. They explore event-driven pipelines, access control, data flow ownership, and metadata propagation. The importance of reliable integrations and extensible systems is emphasized, along with tools like Open Lineage and DBT. Python and open metadata platforms are highlighted for simplifying integration and managing permissions and roles across data tools.

53 snips
Nov 20, 2023 • 1h 16min
Unlocking Your dbt Projects With Practical Advice For Practitioners
Learn practical advice for building and scaling dbt projects, including adopting and using dbt, complexities of data lakes, challenges with YAML in dbt projects, scaling dbt projects, and importance of planning and structure in dbt.

Nov 13, 2023 • 1h 8min
Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine
Eran Yahav, founder of Tabnine, discusses the journey of building an AI assistant for software engineers. Topics include advancements in AI code completion, the usage and effectiveness of Tabnine, challenges of customizing generative AI for software engineering, and future directions for Tabnine.

Nov 6, 2023 • 55min
Shining Some Light In The Black Box Of PostgreSQL Performance
Lukas Fittl, a database performance expert, discusses performance bottlenecks in PostgreSQL, tools like 'explain' in PostgreSQL, common optimization challenges, and the importance of tuning configuration parameters. He also shares insights on the development of PG analyze, enabling performance settings in PostgreSQL, and the evolution of database engines.

Oct 30, 2023 • 47min
Surveying The Market Of Database Products
Tanya Bragin, an experienced product manager for major vendors, shares insights on how to approach tool selection in the database market. Topics include open-source technologies, trends in database market, challenges in data projects, importance of data observability tools, and future trends.

31 snips
Oct 23, 2023 • 1h 4min
Defining A Strategy For Your Data Products
Ranjith Raghunath shares his thoughts on building a strategy for data products, including centralizing vs decentralizing data product strategy, managing technical debt, and the importance of metrics in data product strategy.