
Data Engineering Podcast
This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
Latest episodes

4 snips
Oct 20, 2024 • 58min
Bring Vector Search And Storage To The Data Lake With Lance
Weston Pace, a software engineer at LanceDB and contributor to Arrow, discusses the intersection of vector databases and AI. He highlights how Lance integrates seamlessly with data lakes, offering fast access and efficient schema evolution. The focus on the Lance file format showcases its advantages over traditional storage methods, particularly for multimedia tasks. Weston elaborates on optimizing latency in AI applications and the importance of user-friendly tools in the evolving vector store ecosystem.

23 snips
Oct 13, 2024 • 54min
The Role of Python in Shaping the Future of Data Platforms with DLT
Adrian Broderieux and Marcin Rudolph, co-founders of DLT Hub, share their insights on the transformative role of Python in data platforms. They discuss DLT as a versatile library integrating with lakehouses and AI frameworks. The duo highlights high-performance libraries like PyArrow's impact on metadata management and parallel processing. They also explore the significance of interoperability and evolving governance challenges in data ingestion. Exciting plans for a portable data lake promise to enhance user access and experience in data management.

12 snips
Oct 6, 2024 • 43min
Build Your Data Transformations Faster And Safer With SDF
Lukas Schulte, Co-founder and CEO of SDF, dives into the revolutionary features of this SQL transformation tool designed for data privacy, governance, and quality. He discusses SDF's unique architecture built with Rust, enhancing both performance and reliability. Schulte explores the evolution of data transformation from static analysis to type-safe execution. He highlights the crucial role of classifiers in data governance and the ongoing development plans, including support for Python models, aimed at further improving developer workflows.

Sep 23, 2024 • 57min
Scaling Airbyte: Challenges and Milestones on the Road to 1.0
Michel Tricot, a key figure in the development of Airbyte, discusses the significant milestones leading to the platform's anticipated 1.0 launch. He shares insights on evolving from simplicity to sophisticated integrations while addressing industry shifts and user feedback. Michel delves into the challenges faced in scaling an open-source product and innovative applications of Airbyte technology, such as Cache warmup with Redis. He also highlights future enhancements, including improved operational support and the introduction of a Connector Marketplace.

Sep 1, 2024 • 39min
Enhancing Data Accessibility and Governance with Gravitino
Junping Du, an expert in data management and the creator of Gravitino, discusses how this open-source metadata service revolutionizes data accessibility and governance. He explains Gravitino's unified interface for querying diverse data sources, addressing challenges in managing both structured and unstructured data. Junping highlights the importance of centralized governance and the tool's architectural design that promotes operational efficiency. Additionally, he talks about bridging gaps between data and AI professionals to foster collaboration and innovation in the field.

14 snips
Aug 4, 2024 • 54min
The Evolution of DataOps: Insights from DataKitchen's CEO
Chris Berg, CEO of DataKitchen, is on a mission to simplify the lives of data engineers. He discusses the complexities of DataOps and the frustrations of constant system failures and client demands. Chris emphasizes the need for process-focused approaches over mere tools. He introduces DataKitchen's open-source tools, DataOps TestGen and Observability, which enhance data quality and monitoring. Stressing team dynamics, he advocates for collaboration and proactive communication to improve data workflows and overall job satisfaction.

26 snips
Jul 28, 2024 • 49min
Achieving Data Reliability: The Role of Data Contracts in Modern Data Management
Tom Baeyens, an expert in data management, dives into the pivotal role of data contracts in ensuring reliability. He explains how these contracts act as guarantees for data quality and adherence to schemas. The discussion emphasizes the importance of robust testing and observability strategies to prevent issues in data pipelines. Baeyens also covers the collaboration required between data producers and consumers, along with the potential of generative AI to transform data contract management, paving the way for enhanced integrity in analytical data.

Jul 21, 2024 • 55min
How Generative AI Is Impacting Data Engineering Teams
Lior Gavish, co-founder of Monte Carlo, discusses how data teams are evolving to support AI features and incorporating AI into their work. Topics include impact of generative AI, integrating AI models into data workflows, challenges in building with generative AI, and optimizing AI solutions for data teams.

Jul 13, 2024 • 53min
The Role of Product Managers in Data-Centric Organizations
Praveen Gujjar, Director of Product at LinkedIn, discusses the evolving role of product managers in data-centric environments, emphasizing clean, reliable data. He highlights challenges in building scalable data platforms, platformization complexities, and the role of product managers in bridging engineering and business teams. Praveen provides insights on long-term planning, strong relationships with engineering teams, and the future of data management.

Jul 8, 2024 • 58min
Neon: A Serverless And Developer Friendly Postgres
Nikita Shamgunov shares the journey of creating a serverless Postgres solution, Neon. Topics include maintaining Postgres compatibility, using database branches for isolated environments, managing latency and reliability in deployments, the PG Vector Extension for AI applications, open source vs. business models, and future plans for Neon.
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.