

Data Engineering Podcast
Tobias Macey
This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
Episodes
Mentioned books

21 snips
Feb 1, 2026 • 57min
Branches, Diffs, and SQL: How Dolt Powers Agentic Workflows
Tim Sehn, founder and CEO of DoltHub and creator of Dolt — a version-controlled SQL database — explains why Git-style semantics belong in data systems. He covers row-level branching, merging, and diffs, real production use cases like reproducible ML feature stores and game config, and how branches enable safe agentic writes and PR-style data reviews.

74 snips
Jan 25, 2026 • 41min
Logical First, Physical Second: A Pragmatic Path to Trusted Data
Jamie Knowles, Product Director for ER/Studio with decades in data modeling and architecture, explains why meaning should drive designs. He talks about building shared semantic models, avoiding schema sprawl, and evolving architecture alongside delivery. He also covers governance, practical modeling techniques, and the double-edged role of generative AI in drafting models without human-approved ontologies.

43 snips
Jan 18, 2026 • 1h 12min
Your Data, Your Lake: How Observe Uses Iceberg and Streaming ETL for Observability
Jacob Leverich, Cofounder and CTO of Observe, brings his vast experience from Splunk and Google to discuss the transformative power of lakehouse architectures in observability. He addresses the struggles organizations face with fragmented tools and high costs, introducing innovative solutions leveraging OpenTelemetry and Kafka for efficient data ingestion. Jacob dives into the benefits of using Iceberg for better data organization, the intricacies of query orchestration for low-latency responses, and the importance of metadata in enhancing user experience.

23 snips
Jan 12, 2026 • 57min
Semantic Operators Meet Dataframes: Building Context for Agents with FENIC
Kostas Pardalis, a data infrastructure engineer and founder, discusses Fennec, a revolutionary DataFrame engine designed for LLM-powered data workflows. He explains the limitations of traditional data infrastructures and introduces semantic operators that transform unstructured data into structured schemas. Kostas delves into Fennec's architecture, lazy DataFrame plans, and optimizer design, emphasizing its role in enhancing context management for agents. He also shares practical use cases and the future potential of integrating Fennec with other frameworks for scalable, reliable data solutions.

45 snips
Jan 5, 2026 • 49min
Beyond Dashboards: How Data Teams Earn a Seat at the Table
Goutham Budati, a data leader known for the Data–Perspective–Action framework, explores how data teams can elevate their influence in business. He shares insights on transforming reactive tasks into proactive strategies, emphasizing the importance of context and storytelling. Goutham discusses the necessity of creating living dashboards, aligning technical projects with business goals, and maintaining trust in metrics. He advocates for collaboration between analytics engineers and analysts, promoting continuous insight generation through structured narratives.

13 snips
Dec 29, 2025 • 59min
Unfreezing The Data Lake: The Future-Proof File Format
Xinyu Zeng, a PhD student and database researcher, dives deep into F3, the innovative 'future-proof file format' he’s developing. He highlights the limitations of existing formats like Parquet and ORC, tackling issues such as CPU-bound decoding and metadata overhead. By rethinking the layout and using WebAssembly for self-decoding, F3 aims to advance data handling. Xinyu discusses the importance of decoupling formats, supports multimodal data, and shares future directions, including integrating with existing technologies to enhance data lakes.

50 snips
Dec 21, 2025 • 1h 6min
From Context to Semantics: How Metadata Powers Agentic AI
Suresh Srinivas, a data platform technologist and co-founder of OpenMetadata, and Sriharsha Chintalapani, CTO of Collate, delve into the transformative role of metadata in AI. They discuss how metadata evolves from a human-centric tool to a foundational layer for AI, emphasizing the importance of semantics for accurate outcomes. The conversation highlights automated documentation and governance enhancements, scaling agent workflows, and the crucial balance of user identity and policy enforcement as AI access expands. Their insights reveal how marrying big data with ontologies can create machine-understandable meaning.

76 snips
Dec 14, 2025 • 27min
From Data Engineering to AI Engineering: Where the Lines Blur
Explore the evolution of data engineering as it merges with AI. Discover how the transition from Hadoop to cloud warehouses has shaped current practices. Uncover the impact of LLMs and how unstructured data is revolutionizing information retrieval. Delve into operational demands, including uptime and latency, in customer-facing applications. Reflect on the need for collaboration, new testing practices, and a community approach to emerging AI workflows. This journey emphasizes adapting skills to a rapidly changing technological landscape.

65 snips
Dec 8, 2025 • 59min
Malloy: Hierarchical Data, Semantic Models, and the Future of Analytics
Michael Toy, co-creator of the Malloy language and former Looker engineer, discusses revolutionizing data interaction beyond SQL. He shares insights on Malloy’s design as a human-centric, composable language, emphasizing semantic modeling and hierarchical data. Michael explains the practical implications of eliminating SQL barriers and the user-friendly syntax aimed at easing adoption. He also highlights the exciting synergy between Malloy and LLM-generated queries and invites collaboration in the open-source development to shape its future.

48 snips
Nov 24, 2025 • 1h 1min
Blurring Lines: Data, AI, and the New Playbook for Team Velocity
Max Beauchemin, founder and engineer behind Apache Airflow, dives into the transformative interplay of data and AI engineering. He discusses how using AI for most tasks shifts human roles towards orchestration and taste management, leading to new bottlenecks in code review and QA. Max highlights the concept of treating context as code and advocates for just-in-time retrieval to enhance data tools. He also introduces Agor, a multiplayer orchestration platform designed for efficient agent management and collaborative workflows.


