

Data Engineering Podcast
Tobias Macey
This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
Episodes
Mentioned books

68 snips
Dec 14, 2025 • 27min
From Data Engineering to AI Engineering: Where the Lines Blur
Explore the evolution of data engineering as it merges with AI. Discover how the transition from Hadoop to cloud warehouses has shaped current practices. Uncover the impact of LLMs and how unstructured data is revolutionizing information retrieval. Delve into operational demands, including uptime and latency, in customer-facing applications. Reflect on the need for collaboration, new testing practices, and a community approach to emerging AI workflows. This journey emphasizes adapting skills to a rapidly changing technological landscape.

62 snips
Dec 8, 2025 • 59min
Malloy: Hierarchical Data, Semantic Models, and the Future of Analytics
Michael Toy, co-creator of the Malloy language and former Looker engineer, discusses revolutionizing data interaction beyond SQL. He shares insights on Malloy’s design as a human-centric, composable language, emphasizing semantic modeling and hierarchical data. Michael explains the practical implications of eliminating SQL barriers and the user-friendly syntax aimed at easing adoption. He also highlights the exciting synergy between Malloy and LLM-generated queries and invites collaboration in the open-source development to shape its future.

29 snips
Nov 24, 2025 • 1h 1min
Blurring Lines: Data, AI, and the New Playbook for Team Velocity
Max Beauchemin, founder and engineer behind Apache Airflow, dives into the transformative interplay of data and AI engineering. He discusses how using AI for most tasks shifts human roles towards orchestration and taste management, leading to new bottlenecks in code review and QA. Max highlights the concept of treating context as code and advocates for just-in-time retrieval to enhance data tools. He also introduces Agor, a multiplayer orchestration platform designed for efficient agent management and collaborative workflows.

57 snips
Nov 16, 2025 • 52min
State, Scale, and Signals: Rethinking Orchestration with Durable Execution
Preeti Somal, EVP of Engineering at Temporal and expert in durable execution, dives into innovative methods for building stateful data systems. She discusses how Temporal's code-first model simplifies reliability and reduces the need for error-handling scaffolding. With insights on integrating application and data teams, managing large data while keeping orchestration lightweight, and the importance of observability, Preeti shares strategies for efficiently handling long-running AI workflows. She also highlights practical adoption patterns and the role of Nexus in creating seamless cross-boundary calls.

71 snips
Nov 9, 2025 • 52min
The AI Data Paradox: High Trust in Models, Low Trust in Data
Ariel Pohoryles, head of product marketing at Boomi with over 20 years in data engineering, discusses a fascinating survey of 300 data leaders. He reveals the surprising paradox where 77% trust AI data yet only 50% trust their organization's overall data. Ariel emphasizes the need for stronger automation and governance in data management for effective AI production. He explores the challenges of unstructured data, advocates for automated pipelines, and predicts a convergence between data and application teams, highlighting the importance of managing AI workflows responsibly.

34 snips
Nov 2, 2025 • 51min
Bridging the AI–Data Gap: Collect, Curate, Serve
Omri Lifshitz and Ido Bronstein, co-founders of Upriver, delve into the challenges of bridging the gap between AI's demand for quality data and current organizational practices. They highlight the importance of the middle layer of data curation and semantics, presenting a three-part framework: collect, curate, and serve. The duo discusses scaling from proof of concepts to production, the significance of context in AI responses, and innovative methods for automating data documentation. They envision an AI-first future where data engineers focus on strategic roles and oversee business semantics.

12 snips
Oct 27, 2025 • 1h 5min
Beyond the Perimeter: Practical Patterns for Fine‑Grained Data Access
In this discussion, Matt Topper, President of UberEther and a veteran in identity and data, dives deep into the complexities of managing identity and access control within modern data platforms. He highlights challenges posed by composable ecosystems and offers innovative solutions like using JWTs and external policy engines. Topics also include cryptographic policy binding with OpenTDF, the importance of governance in data systems, and how AI could translate regulations into actionable policies. The conversation reveals critical insights into securing data access while promoting seamless integration.

54 snips
Oct 18, 2025 • 1h 4min
The True Costs of Legacy Systems: Technical Debt, Risk, and Exit Strategies
Kate Shaw, Senior Product Manager for Data at SnapLogic, dives into the complexities of legacy systems and their modern replacements. She highlights that legacy isn't just age—it's about risk and innovation barriers. They discuss technical debt, lost context from turnover, and the dangers of 'if it ain’t broke.' Shaw advocates for composable architectures and planning exit strategies from day one. Additionally, she touches on integrating legacy systems into AI initiatives and the importance of transparency in data governance. A must-listen for anyone navigating modernization!

88 snips
Oct 11, 2025 • 52min
Context Engineering as a Discipline: Building Governed AI Analytics
Nick Schrock, CTO and founder of Dagster Labs, shares his insights on agentic analytics and the innovative Compass tool he developed. He explains how Compass transforms data teams into stewards of context while integrating seamlessly with Slack for enhanced collaboration. Schrock discusses the implications of agentic systems on Conway's Law and the need for new infrastructure to support these workflows. He also highlights cost control strategies and the future of context engineering in software development, unveiling his optimistic outlook on AI advancements.

137 snips
Oct 5, 2025 • 1h 1min
The Data Model That Captures Your Business: Metric Trees Explained
Vijay Subramanian, CEO of Trace and former data leader at Rent the Runway, dives into the revolutionary concept of metric trees as a data model that mirrors a company's business framework. He reveals how traditional dashboards often miss the mark and how metric trees can enhance analytical workflows by clarifying cause and effect. Vijay shares insights on leveraging these trees alongside AI agents for operational analytics and discusses real-world applications like modeling customer journeys. He also emphasizes the importance of collaboration with business teams to effectively implement this innovative approach.


