Data Engineering Podcast

Tobias Macey
undefined
43 snips
Jan 5, 2026 • 49min

Beyond Dashboards: How Data Teams Earn a Seat at the Table

Goutham Budati, a data leader known for the Data–Perspective–Action framework, explores how data teams can elevate their influence in business. He shares insights on transforming reactive tasks into proactive strategies, emphasizing the importance of context and storytelling. Goutham discusses the necessity of creating living dashboards, aligning technical projects with business goals, and maintaining trust in metrics. He advocates for collaboration between analytics engineers and analysts, promoting continuous insight generation through structured narratives.
undefined
11 snips
Dec 29, 2025 • 59min

Unfreezing The Data Lake: The Future-Proof File Format

Xinyu Zeng, a PhD student and database researcher, dives deep into F3, the innovative 'future-proof file format' he’s developing. He highlights the limitations of existing formats like Parquet and ORC, tackling issues such as CPU-bound decoding and metadata overhead. By rethinking the layout and using WebAssembly for self-decoding, F3 aims to advance data handling. Xinyu discusses the importance of decoupling formats, supports multimodal data, and shares future directions, including integrating with existing technologies to enhance data lakes.
undefined
43 snips
Dec 21, 2025 • 1h 6min

From Context to Semantics: How Metadata Powers Agentic AI

Suresh Srinivas, a data platform technologist and co-founder of OpenMetadata, and Sriharsha Chintalapani, CTO of Collate, delve into the transformative role of metadata in AI. They discuss how metadata evolves from a human-centric tool to a foundational layer for AI, emphasizing the importance of semantics for accurate outcomes. The conversation highlights automated documentation and governance enhancements, scaling agent workflows, and the crucial balance of user identity and policy enforcement as AI access expands. Their insights reveal how marrying big data with ontologies can create machine-understandable meaning.
undefined
76 snips
Dec 14, 2025 • 27min

From Data Engineering to AI Engineering: Where the Lines Blur

Explore the evolution of data engineering as it merges with AI. Discover how the transition from Hadoop to cloud warehouses has shaped current practices. Uncover the impact of LLMs and how unstructured data is revolutionizing information retrieval. Delve into operational demands, including uptime and latency, in customer-facing applications. Reflect on the need for collaboration, new testing practices, and a community approach to emerging AI workflows. This journey emphasizes adapting skills to a rapidly changing technological landscape.
undefined
65 snips
Dec 8, 2025 • 59min

Malloy: Hierarchical Data, Semantic Models, and the Future of Analytics

Michael Toy, co-creator of the Malloy language and former Looker engineer, discusses revolutionizing data interaction beyond SQL. He shares insights on Malloy’s design as a human-centric, composable language, emphasizing semantic modeling and hierarchical data. Michael explains the practical implications of eliminating SQL barriers and the user-friendly syntax aimed at easing adoption. He also highlights the exciting synergy between Malloy and LLM-generated queries and invites collaboration in the open-source development to shape its future.
undefined
33 snips
Nov 24, 2025 • 1h 1min

Blurring Lines: Data, AI, and the New Playbook for Team Velocity

Max Beauchemin, founder and engineer behind Apache Airflow, dives into the transformative interplay of data and AI engineering. He discusses how using AI for most tasks shifts human roles towards orchestration and taste management, leading to new bottlenecks in code review and QA. Max highlights the concept of treating context as code and advocates for just-in-time retrieval to enhance data tools. He also introduces Agor, a multiplayer orchestration platform designed for efficient agent management and collaborative workflows.
undefined
57 snips
Nov 16, 2025 • 52min

State, Scale, and Signals: Rethinking Orchestration with Durable Execution

Preeti Somal, EVP of Engineering at Temporal and expert in durable execution, dives into innovative methods for building stateful data systems. She discusses how Temporal's code-first model simplifies reliability and reduces the need for error-handling scaffolding. With insights on integrating application and data teams, managing large data while keeping orchestration lightweight, and the importance of observability, Preeti shares strategies for efficiently handling long-running AI workflows. She also highlights practical adoption patterns and the role of Nexus in creating seamless cross-boundary calls.
undefined
71 snips
Nov 9, 2025 • 52min

The AI Data Paradox: High Trust in Models, Low Trust in Data

Ariel Pohoryles, head of product marketing at Boomi with over 20 years in data engineering, discusses a fascinating survey of 300 data leaders. He reveals the surprising paradox where 77% trust AI data yet only 50% trust their organization's overall data. Ariel emphasizes the need for stronger automation and governance in data management for effective AI production. He explores the challenges of unstructured data, advocates for automated pipelines, and predicts a convergence between data and application teams, highlighting the importance of managing AI workflows responsibly.
undefined
34 snips
Nov 2, 2025 • 51min

Bridging the AI–Data Gap: Collect, Curate, Serve

Omri Lifshitz and Ido Bronstein, co-founders of Upriver, delve into the challenges of bridging the gap between AI's demand for quality data and current organizational practices. They highlight the importance of the middle layer of data curation and semantics, presenting a three-part framework: collect, curate, and serve. The duo discusses scaling from proof of concepts to production, the significance of context in AI responses, and innovative methods for automating data documentation. They envision an AI-first future where data engineers focus on strategic roles and oversee business semantics.
undefined
12 snips
Oct 27, 2025 • 1h 5min

Beyond the Perimeter: Practical Patterns for Fine‑Grained Data Access

In this discussion, Matt Topper, President of UberEther and a veteran in identity and data, dives deep into the complexities of managing identity and access control within modern data platforms. He highlights challenges posed by composable ecosystems and offers innovative solutions like using JWTs and external policy engines. Topics also include cryptographic policy binding with OpenTDF, the importance of governance in data systems, and how AI could translate regulations into actionable policies. The conversation reveals critical insights into securing data access while promoting seamless integration.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app