Data Engineering Podcast

Tobias Macey
undefined
21 snips
Feb 1, 2026 • 57min

Branches, Diffs, and SQL: How Dolt Powers Agentic Workflows

Tim Sehn, founder and CEO of DoltHub and creator of Dolt — a version-controlled SQL database — explains why Git-style semantics belong in data systems. He covers row-level branching, merging, and diffs, real production use cases like reproducible ML feature stores and game config, and how branches enable safe agentic writes and PR-style data reviews.
undefined
74 snips
Jan 25, 2026 • 41min

Logical First, Physical Second: A Pragmatic Path to Trusted Data

Jamie Knowles, Product Director for ER/Studio with decades in data modeling and architecture, explains why meaning should drive designs. He talks about building shared semantic models, avoiding schema sprawl, and evolving architecture alongside delivery. He also covers governance, practical modeling techniques, and the double-edged role of generative AI in drafting models without human-approved ontologies.
undefined
43 snips
Jan 18, 2026 • 1h 12min

Your Data, Your Lake: How Observe Uses Iceberg and Streaming ETL for Observability

Jacob Leverich, Cofounder and CTO of Observe, brings his vast experience from Splunk and Google to discuss the transformative power of lakehouse architectures in observability. He addresses the struggles organizations face with fragmented tools and high costs, introducing innovative solutions leveraging OpenTelemetry and Kafka for efficient data ingestion. Jacob dives into the benefits of using Iceberg for better data organization, the intricacies of query orchestration for low-latency responses, and the importance of metadata in enhancing user experience.
undefined
23 snips
Jan 12, 2026 • 57min

Semantic Operators Meet Dataframes: Building Context for Agents with FENIC

Kostas Pardalis, a data infrastructure engineer and founder, discusses Fennec, a revolutionary DataFrame engine designed for LLM-powered data workflows. He explains the limitations of traditional data infrastructures and introduces semantic operators that transform unstructured data into structured schemas. Kostas delves into Fennec's architecture, lazy DataFrame plans, and optimizer design, emphasizing its role in enhancing context management for agents. He also shares practical use cases and the future potential of integrating Fennec with other frameworks for scalable, reliable data solutions.
undefined
45 snips
Jan 5, 2026 • 49min

Beyond Dashboards: How Data Teams Earn a Seat at the Table

Goutham Budati, a data leader known for the Data–Perspective–Action framework, explores how data teams can elevate their influence in business. He shares insights on transforming reactive tasks into proactive strategies, emphasizing the importance of context and storytelling. Goutham discusses the necessity of creating living dashboards, aligning technical projects with business goals, and maintaining trust in metrics. He advocates for collaboration between analytics engineers and analysts, promoting continuous insight generation through structured narratives.
undefined
13 snips
Dec 29, 2025 • 59min

Unfreezing The Data Lake: The Future-Proof File Format

Xinyu Zeng, a PhD student and database researcher, dives deep into F3, the innovative 'future-proof file format' he’s developing. He highlights the limitations of existing formats like Parquet and ORC, tackling issues such as CPU-bound decoding and metadata overhead. By rethinking the layout and using WebAssembly for self-decoding, F3 aims to advance data handling. Xinyu discusses the importance of decoupling formats, supports multimodal data, and shares future directions, including integrating with existing technologies to enhance data lakes.
undefined
50 snips
Dec 21, 2025 • 1h 6min

From Context to Semantics: How Metadata Powers Agentic AI

Suresh Srinivas, a data platform technologist and co-founder of OpenMetadata, and Sriharsha Chintalapani, CTO of Collate, delve into the transformative role of metadata in AI. They discuss how metadata evolves from a human-centric tool to a foundational layer for AI, emphasizing the importance of semantics for accurate outcomes. The conversation highlights automated documentation and governance enhancements, scaling agent workflows, and the crucial balance of user identity and policy enforcement as AI access expands. Their insights reveal how marrying big data with ontologies can create machine-understandable meaning.
undefined
76 snips
Dec 14, 2025 • 27min

From Data Engineering to AI Engineering: Where the Lines Blur

Explore the evolution of data engineering as it merges with AI. Discover how the transition from Hadoop to cloud warehouses has shaped current practices. Uncover the impact of LLMs and how unstructured data is revolutionizing information retrieval. Delve into operational demands, including uptime and latency, in customer-facing applications. Reflect on the need for collaboration, new testing practices, and a community approach to emerging AI workflows. This journey emphasizes adapting skills to a rapidly changing technological landscape.
undefined
65 snips
Dec 8, 2025 • 59min

Malloy: Hierarchical Data, Semantic Models, and the Future of Analytics

Michael Toy, co-creator of the Malloy language and former Looker engineer, discusses revolutionizing data interaction beyond SQL. He shares insights on Malloy’s design as a human-centric, composable language, emphasizing semantic modeling and hierarchical data. Michael explains the practical implications of eliminating SQL barriers and the user-friendly syntax aimed at easing adoption. He also highlights the exciting synergy between Malloy and LLM-generated queries and invites collaboration in the open-source development to shape its future.
undefined
48 snips
Nov 24, 2025 • 1h 1min

Blurring Lines: Data, AI, and the New Playbook for Team Velocity

Max Beauchemin, founder and engineer behind Apache Airflow, dives into the transformative interplay of data and AI engineering. He discusses how using AI for most tasks shifts human roles towards orchestration and taste management, leading to new bottlenecks in code review and QA. Max highlights the concept of treating context as code and advocates for just-in-time retrieval to enhance data tools. He also introduces Agor, a multiplayer orchestration platform designed for efficient agent management and collaborative workflows.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app