

Data Engineering Podcast
Tobias Macey
This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
Episodes
Mentioned books

119 snips
Feb 26, 2025 • 60min
The Future of Data Engineering: AI, LLMs, and Automation
Gleb Mezhanskiy, CEO and co-founder of Datafold, shares insights from his journey in data engineering and the integration of AI. He discusses how large language models can streamline code writing, improve data accessibility, and facilitate testing and code reviews. Mezhanskiy emphasizes the challenges at the intersection of AI and data workflows, advocating for continuous adaptation. With practical applications like text-to-SQL and enhanced data observability, he paints an optimistic picture for the future of data engineering.

30 snips
Feb 16, 2025 • 39min
Evolving Responsibilities in AI Data Management
Bartosz Mikulski, an MLOps engineer with a rich background in data engineering, dives deep into the realm of AI data management. He highlights the crucial role of data testing in AI applications, especially with the rise of generative AI. Bartosz discusses the need for specialized datasets and the skills required for data engineers to transition into AI. He also addresses challenges like frequent data reprocessing and unstructured data handling, showcasing the evolving responsibilities in this fast-paced field.

27 snips
Jan 13, 2025 • 55min
CSVs Will Never Die And OneSchema Is Counting On It
Andrew Luo, CEO of OneSchema, shares his expertise in data engineering and CRM migration, focusing on the enduring relevance of CSVs. He discusses the common challenges of CSV data, such as inconsistency and lack of standards, and explains how OneSchema uses AI for improved type validation and parsing. Andrew highlights OneSchema's potential to streamline data imports and boost efficiency, particularly for non-technical users. He also reveals plans for future innovations, including industry-specific transformation packs to enhance data management further.

40 snips
Jan 3, 2025 • 58min
Breaking Down Data Silos: AI and ML in Master Data Management
Dan Bruckner, Co-founder and CTO of Tamr and former CERN physicist, shares his insights into master data management (MDM) enhanced by AI and machine learning. He discusses his transition from physics to data science, highlighting challenges in reconciling large data sets. Dan explains how data silos form within organizations and emphasizes the role of large language models in improving user experience and data trust. He advocates for combining AI capabilities with human oversight to ensure accuracy while tackling complex data management issues.

67 snips
Dec 23, 2024 • 50min
Building a Data Vision Board: A Guide to Strategic Planning
Lior Barak, a data expert with 15 years of experience in data product strategy, shares invaluable insights on strategic planning in data management. He introduces the concept of a 'data vision board' as a tool for organizations to align their data strategies with regulatory and stakeholder needs. Lior emphasizes the importance of balancing immediate demands with long-term goals, quantifying data issues for prioritization, and maintaining a flexible, living strategic plan. His practical advice encourages data teams to transition from mere enablers to impactful creators.

40 snips
Dec 16, 2024 • 60min
How Orchestration Impacts Data Platform Architecture
Hugo Lu, CEO and co-founder of Orchestra, delves into the vital role of data orchestration in platform architecture. He highlights how the choice of orchestration engines influences data flow management and overall efficiency. The discussion covers the evolution of orchestration from early models to modern applications like Kubernetes, reveals the challenges of traditional systems, and emphasizes the need for flexibility in architecture. Lu also addresses the distinct demands of analytical versus product-oriented applications, especially with the rise of AI integration.

70 snips
Dec 8, 2024 • 52min
An Exploration Of The Impediments To Reusable Data Pipelines
Max Beauchemin, a data engineer with two decades of experience and founder of Preset, dives into the complexities of reusable data pipelines. He discusses the "write everything twice" problem, emphasizing the need for collaboration and shared reference implementations. Max explores the challenges of managing diverse SQL dialects and the evolving role of data engineers, likening it to front-end development. He envisions generative AI aiding knowledge distribution and encourages the community to engage in sharing templates to drive innovation in the field.

45 snips
Dec 1, 2024 • 60min
The Art of Database Selection and Evolution
Sam Kleinman, a seasoned software engineer with experience at MongoDB, dives deep into the art of database selection. He discusses the critical trade-offs in database architectures and how they shape system design. Sam warns against the pitfalls of over-engineering and stresses leveraging database capabilities rather than pushing logic to the application layer. He identifies a significant gap in effective testing tools for database interactions, advocating for improved paradigms to ensure reliability. This insightful conversation blends technical expertise with practical advice for modern data management.

16 snips
Nov 26, 2024 • 45min
Bridging Code and UI in Data Orchestration with Kestra
Anna Geller, Product Lead at Kestra and former data engineer at KPMG, dives into the fascinating realm of data orchestration. She explains how Kestra bridges the gap between coding and user interfaces, advocating for a hybrid low-code approach. Anna highlights the limitations of existing tools and how Kestra’s API-first design and scalable architecture tackle these challenges. The conversation also touches on the complexities of managing workflows, the role of real-time data, and the innovative functionalities that empower both technical and non-technical users.

36 snips
Nov 18, 2024 • 40min
Streaming Data Into The Lakehouse With Iceberg And Trino At Going
Ken Pickering, VP of Engineering at Going, leads a data platform team focused on finding the best travel deals. He discusses the complexities of streaming data into a Trino and Iceberg lakehouse, sharing his experience in managing vast flight datasets. Ken elaborates on their dual approach to search strategies—passive and active—and the technologies like Confluent and Databricks that support their operations. He highlights collaboration within the engineering teams and the challenges of maintaining data quality and governance in a rapidly evolving landscape.