Data Engineering Podcast

Tobias Macey
undefined
68 snips
Dec 14, 2025 • 27min

From Data Engineering to AI Engineering: Where the Lines Blur

Explore the evolution of data engineering as it merges with AI. Discover how the transition from Hadoop to cloud warehouses has shaped current practices. Uncover the impact of LLMs and how unstructured data is revolutionizing information retrieval. Delve into operational demands, including uptime and latency, in customer-facing applications. Reflect on the need for collaboration, new testing practices, and a community approach to emerging AI workflows. This journey emphasizes adapting skills to a rapidly changing technological landscape.
undefined
62 snips
Dec 8, 2025 • 59min

Malloy: Hierarchical Data, Semantic Models, and the Future of Analytics

Michael Toy, co-creator of the Malloy language and former Looker engineer, discusses revolutionizing data interaction beyond SQL. He shares insights on Malloy’s design as a human-centric, composable language, emphasizing semantic modeling and hierarchical data. Michael explains the practical implications of eliminating SQL barriers and the user-friendly syntax aimed at easing adoption. He also highlights the exciting synergy between Malloy and LLM-generated queries and invites collaboration in the open-source development to shape its future.
undefined
29 snips
Nov 24, 2025 • 1h 1min

Blurring Lines: Data, AI, and the New Playbook for Team Velocity

Max Beauchemin, founder and engineer behind Apache Airflow, dives into the transformative interplay of data and AI engineering. He discusses how using AI for most tasks shifts human roles towards orchestration and taste management, leading to new bottlenecks in code review and QA. Max highlights the concept of treating context as code and advocates for just-in-time retrieval to enhance data tools. He also introduces Agor, a multiplayer orchestration platform designed for efficient agent management and collaborative workflows.
undefined
57 snips
Nov 16, 2025 • 52min

State, Scale, and Signals: Rethinking Orchestration with Durable Execution

Preeti Somal, EVP of Engineering at Temporal and expert in durable execution, dives into innovative methods for building stateful data systems. She discusses how Temporal's code-first model simplifies reliability and reduces the need for error-handling scaffolding. With insights on integrating application and data teams, managing large data while keeping orchestration lightweight, and the importance of observability, Preeti shares strategies for efficiently handling long-running AI workflows. She also highlights practical adoption patterns and the role of Nexus in creating seamless cross-boundary calls.
undefined
71 snips
Nov 9, 2025 • 52min

The AI Data Paradox: High Trust in Models, Low Trust in Data

Ariel Pohoryles, head of product marketing at Boomi with over 20 years in data engineering, discusses a fascinating survey of 300 data leaders. He reveals the surprising paradox where 77% trust AI data yet only 50% trust their organization's overall data. Ariel emphasizes the need for stronger automation and governance in data management for effective AI production. He explores the challenges of unstructured data, advocates for automated pipelines, and predicts a convergence between data and application teams, highlighting the importance of managing AI workflows responsibly.
undefined
34 snips
Nov 2, 2025 • 51min

Bridging the AI–Data Gap: Collect, Curate, Serve

Omri Lifshitz and Ido Bronstein, co-founders of Upriver, delve into the challenges of bridging the gap between AI's demand for quality data and current organizational practices. They highlight the importance of the middle layer of data curation and semantics, presenting a three-part framework: collect, curate, and serve. The duo discusses scaling from proof of concepts to production, the significance of context in AI responses, and innovative methods for automating data documentation. They envision an AI-first future where data engineers focus on strategic roles and oversee business semantics.
undefined
12 snips
Oct 27, 2025 • 1h 5min

Beyond the Perimeter: Practical Patterns for Fine‑Grained Data Access

In this discussion, Matt Topper, President of UberEther and a veteran in identity and data, dives deep into the complexities of managing identity and access control within modern data platforms. He highlights challenges posed by composable ecosystems and offers innovative solutions like using JWTs and external policy engines. Topics also include cryptographic policy binding with OpenTDF, the importance of governance in data systems, and how AI could translate regulations into actionable policies. The conversation reveals critical insights into securing data access while promoting seamless integration.
undefined
54 snips
Oct 18, 2025 • 1h 4min

The True Costs of Legacy Systems: Technical Debt, Risk, and Exit Strategies

Kate Shaw, Senior Product Manager for Data at SnapLogic, dives into the complexities of legacy systems and their modern replacements. She highlights that legacy isn't just age—it's about risk and innovation barriers. They discuss technical debt, lost context from turnover, and the dangers of 'if it ain’t broke.' Shaw advocates for composable architectures and planning exit strategies from day one. Additionally, she touches on integrating legacy systems into AI initiatives and the importance of transparency in data governance. A must-listen for anyone navigating modernization!
undefined
88 snips
Oct 11, 2025 • 52min

Context Engineering as a Discipline: Building Governed AI Analytics

Nick Schrock, CTO and founder of Dagster Labs, shares his insights on agentic analytics and the innovative Compass tool he developed. He explains how Compass transforms data teams into stewards of context while integrating seamlessly with Slack for enhanced collaboration. Schrock discusses the implications of agentic systems on Conway's Law and the need for new infrastructure to support these workflows. He also highlights cost control strategies and the future of context engineering in software development, unveiling his optimistic outlook on AI advancements.
undefined
137 snips
Oct 5, 2025 • 1h 1min

The Data Model That Captures Your Business: Metric Trees Explained

Vijay Subramanian, CEO of Trace and former data leader at Rent the Runway, dives into the revolutionary concept of metric trees as a data model that mirrors a company's business framework. He reveals how traditional dashboards often miss the mark and how metric trees can enhance analytical workflows by clarifying cause and effect. Vijay shares insights on leveraging these trees alongside AI agents for operational analytics and discusses real-world applications like modeling customer journeys. He also emphasizes the importance of collaboration with business teams to effectively implement this innovative approach.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app