
Data Engineering Podcast
This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
Latest episodes

Sep 1, 2024 • 39min
Enhancing Data Accessibility and Governance with Gravitino
Junping Du, an expert in data management and the creator of Gravitino, discusses how this open-source metadata service revolutionizes data accessibility and governance. He explains Gravitino's unified interface for querying diverse data sources, addressing challenges in managing both structured and unstructured data. Junping highlights the importance of centralized governance and the tool's architectural design that promotes operational efficiency. Additionally, he talks about bridging gaps between data and AI professionals to foster collaboration and innovation in the field.

14 snips
Aug 4, 2024 • 54min
The Evolution of DataOps: Insights from DataKitchen's CEO
Chris Berg, CEO of DataKitchen, is on a mission to simplify the lives of data engineers. He discusses the complexities of DataOps and the frustrations of constant system failures and client demands. Chris emphasizes the need for process-focused approaches over mere tools. He introduces DataKitchen's open-source tools, DataOps TestGen and Observability, which enhance data quality and monitoring. Stressing team dynamics, he advocates for collaboration and proactive communication to improve data workflows and overall job satisfaction.

26 snips
Jul 28, 2024 • 49min
Achieving Data Reliability: The Role of Data Contracts in Modern Data Management
Tom Baeyens, an expert in data management, dives into the pivotal role of data contracts in ensuring reliability. He explains how these contracts act as guarantees for data quality and adherence to schemas. The discussion emphasizes the importance of robust testing and observability strategies to prevent issues in data pipelines. Baeyens also covers the collaboration required between data producers and consumers, along with the potential of generative AI to transform data contract management, paving the way for enhanced integrity in analytical data.

Jul 21, 2024 • 55min
How Generative AI Is Impacting Data Engineering Teams
Lior Gavish, co-founder of Monte Carlo, discusses how data teams are evolving to support AI features and incorporating AI into their work. Topics include impact of generative AI, integrating AI models into data workflows, challenges in building with generative AI, and optimizing AI solutions for data teams.

Jul 13, 2024 • 53min
The Role of Product Managers in Data-Centric Organizations
Praveen Gujjar, Director of Product at LinkedIn, discusses the evolving role of product managers in data-centric environments, emphasizing clean, reliable data. He highlights challenges in building scalable data platforms, platformization complexities, and the role of product managers in bridging engineering and business teams. Praveen provides insights on long-term planning, strong relationships with engineering teams, and the future of data management.

Jul 8, 2024 • 58min
Neon: A Serverless And Developer Friendly Postgres
Nikita Shamgunov shares the journey of creating a serverless Postgres solution, Neon. Topics include maintaining Postgres compatibility, using database branches for isolated environments, managing latency and reliability in deployments, the PG Vector Extension for AI applications, open source vs. business models, and future plans for Neon.

13 snips
Jun 30, 2024 • 60min
Improve Data Quality Through Engineering Rigor And Business Engagement With Synq
Petr Janda, CEO of Synq, discusses the importance of data reliability and transparency, emphasizing treating data systems like engineering systems. Synq's platform helps manage incidents, data dependencies, and ensures data quality. By integrating data into business processes, Synq empowers data teams to drive meaningful change and optimize data management.

20 snips
Jun 23, 2024 • 53min
Stitching Together Enterprise Analytics With Microsoft Fabric
Dipti Borkar, an expert at Microsoft Fabric, discusses accelerating enterprise adoption of data lakehouse architectures. She shares experiences and use cases for the Fabric service, highlighting its integration with Spark engine and X-Table for seamless interop. The episode explores optimizing Fabric for enterprise use, innovations in data lake analytics, and the future projections in AI role in data engineering.

61 snips
Jun 16, 2024 • 53min
Being Data Driven At Stripe With Trino And Iceberg
Learn how Stripe utilizes Trino and Iceberg for their data lakehouse, including insights on business analytics, challenges with large datasets, optimizing with Iceberg, and transitioning to REST catalog. Discover the advantages of monitoring queries and managing multi-tool ecosystems with Trino and Spark. Explore the challenges and innovations in cloud data management with Trino and Iceberg at Stripe.

Jun 9, 2024 • 42min
X-Ray Vision For Your Flink Stream Processing With Datorios
Dive into the world of Flink stream processing with Ronen Korman and Stav Elkayam from Datorios. They discuss how observability can enhance visibility into Flink internals, address challenges in real-time data processing, explore the role of Flink in AI applications, and highlight the evolution and integration of Datorios with Apache Flink for stream processing.