Data Engineering Podcast cover image

Data Engineering Podcast

Latest episodes

undefined
Sep 1, 2024 • 39min

Enhancing Data Accessibility and Governance with Gravitino

Junping Du, an expert in data management and the creator of Gravitino, discusses how this open-source metadata service revolutionizes data accessibility and governance. He explains Gravitino's unified interface for querying diverse data sources, addressing challenges in managing both structured and unstructured data. Junping highlights the importance of centralized governance and the tool's architectural design that promotes operational efficiency. Additionally, he talks about bridging gaps between data and AI professionals to foster collaboration and innovation in the field.
undefined
14 snips
Aug 4, 2024 • 54min

The Evolution of DataOps: Insights from DataKitchen's CEO

Chris Berg, CEO of DataKitchen, is on a mission to simplify the lives of data engineers. He discusses the complexities of DataOps and the frustrations of constant system failures and client demands. Chris emphasizes the need for process-focused approaches over mere tools. He introduces DataKitchen's open-source tools, DataOps TestGen and Observability, which enhance data quality and monitoring. Stressing team dynamics, he advocates for collaboration and proactive communication to improve data workflows and overall job satisfaction.
undefined
26 snips
Jul 28, 2024 • 49min

Achieving Data Reliability: The Role of Data Contracts in Modern Data Management

Tom Baeyens, an expert in data management, dives into the pivotal role of data contracts in ensuring reliability. He explains how these contracts act as guarantees for data quality and adherence to schemas. The discussion emphasizes the importance of robust testing and observability strategies to prevent issues in data pipelines. Baeyens also covers the collaboration required between data producers and consumers, along with the potential of generative AI to transform data contract management, paving the way for enhanced integrity in analytical data.
undefined
Jul 21, 2024 • 55min

How Generative AI Is Impacting Data Engineering Teams

Lior Gavish, co-founder of Monte Carlo, discusses how data teams are evolving to support AI features and incorporating AI into their work. Topics include impact of generative AI, integrating AI models into data workflows, challenges in building with generative AI, and optimizing AI solutions for data teams.
undefined
Jul 13, 2024 • 53min

The Role of Product Managers in Data-Centric Organizations

Praveen Gujjar, Director of Product at LinkedIn, discusses the evolving role of product managers in data-centric environments, emphasizing clean, reliable data. He highlights challenges in building scalable data platforms, platformization complexities, and the role of product managers in bridging engineering and business teams. Praveen provides insights on long-term planning, strong relationships with engineering teams, and the future of data management.
undefined
Jul 8, 2024 • 58min

Neon: A Serverless And Developer Friendly Postgres

Nikita Shamgunov shares the journey of creating a serverless Postgres solution, Neon. Topics include maintaining Postgres compatibility, using database branches for isolated environments, managing latency and reliability in deployments, the PG Vector Extension for AI applications, open source vs. business models, and future plans for Neon.
undefined
13 snips
Jun 30, 2024 • 60min

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Petr Janda, CEO of Synq, discusses the importance of data reliability and transparency, emphasizing treating data systems like engineering systems. Synq's platform helps manage incidents, data dependencies, and ensures data quality. By integrating data into business processes, Synq empowers data teams to drive meaningful change and optimize data management.
undefined
20 snips
Jun 23, 2024 • 53min

Stitching Together Enterprise Analytics With Microsoft Fabric

Dipti Borkar, an expert at Microsoft Fabric, discusses accelerating enterprise adoption of data lakehouse architectures. She shares experiences and use cases for the Fabric service, highlighting its integration with Spark engine and X-Table for seamless interop. The episode explores optimizing Fabric for enterprise use, innovations in data lake analytics, and the future projections in AI role in data engineering.
undefined
61 snips
Jun 16, 2024 • 53min

Being Data Driven At Stripe With Trino And Iceberg

Learn how Stripe utilizes Trino and Iceberg for their data lakehouse, including insights on business analytics, challenges with large datasets, optimizing with Iceberg, and transitioning to REST catalog. Discover the advantages of monitoring queries and managing multi-tool ecosystems with Trino and Spark. Explore the challenges and innovations in cloud data management with Trino and Iceberg at Stripe.
undefined
Jun 9, 2024 • 42min

X-Ray Vision For Your Flink Stream Processing With Datorios

Dive into the world of Flink stream processing with Ronen Korman and Stav Elkayam from Datorios. They discuss how observability can enhance visibility into Flink internals, address challenges in real-time data processing, explore the role of Flink in AI applications, and highlight the evolution and integration of Datorios with Apache Flink for stream processing.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app