Data Engineering Podcast

Tobias Macey
undefined
24 snips
Nov 24, 2025 • 1h 1min

Blurring Lines: Data, AI, and the New Playbook for Team Velocity

Max Beauchemin, founder and engineer behind Apache Airflow, dives into the transformative interplay of data and AI engineering. He discusses how using AI for most tasks shifts human roles towards orchestration and taste management, leading to new bottlenecks in code review and QA. Max highlights the concept of treating context as code and advocates for just-in-time retrieval to enhance data tools. He also introduces Agor, a multiplayer orchestration platform designed for efficient agent management and collaborative workflows.
undefined
48 snips
Nov 16, 2025 • 52min

State, Scale, and Signals: Rethinking Orchestration with Durable Execution

Preeti Somal, EVP of Engineering at Temporal and expert in durable execution, dives into innovative methods for building stateful data systems. She discusses how Temporal's code-first model simplifies reliability and reduces the need for error-handling scaffolding. With insights on integrating application and data teams, managing large data while keeping orchestration lightweight, and the importance of observability, Preeti shares strategies for efficiently handling long-running AI workflows. She also highlights practical adoption patterns and the role of Nexus in creating seamless cross-boundary calls.
undefined
71 snips
Nov 9, 2025 • 52min

The AI Data Paradox: High Trust in Models, Low Trust in Data

Ariel Pohoryles, head of product marketing at Boomi with over 20 years in data engineering, discusses a fascinating survey of 300 data leaders. He reveals the surprising paradox where 77% trust AI data yet only 50% trust their organization's overall data. Ariel emphasizes the need for stronger automation and governance in data management for effective AI production. He explores the challenges of unstructured data, advocates for automated pipelines, and predicts a convergence between data and application teams, highlighting the importance of managing AI workflows responsibly.
undefined
34 snips
Nov 2, 2025 • 51min

Bridging the AI–Data Gap: Collect, Curate, Serve

Omri Lifshitz and Ido Bronstein, co-founders of Upriver, delve into the challenges of bridging the gap between AI's demand for quality data and current organizational practices. They highlight the importance of the middle layer of data curation and semantics, presenting a three-part framework: collect, curate, and serve. The duo discusses scaling from proof of concepts to production, the significance of context in AI responses, and innovative methods for automating data documentation. They envision an AI-first future where data engineers focus on strategic roles and oversee business semantics.
undefined
12 snips
Oct 27, 2025 • 1h 5min

Beyond the Perimeter: Practical Patterns for Fine‑Grained Data Access

In this discussion, Matt Topper, President of UberEther and a veteran in identity and data, dives deep into the complexities of managing identity and access control within modern data platforms. He highlights challenges posed by composable ecosystems and offers innovative solutions like using JWTs and external policy engines. Topics also include cryptographic policy binding with OpenTDF, the importance of governance in data systems, and how AI could translate regulations into actionable policies. The conversation reveals critical insights into securing data access while promoting seamless integration.
undefined
54 snips
Oct 18, 2025 • 1h 4min

The True Costs of Legacy Systems: Technical Debt, Risk, and Exit Strategies

Kate Shaw, Senior Product Manager for Data at SnapLogic, dives into the complexities of legacy systems and their modern replacements. She highlights that legacy isn't just age—it's about risk and innovation barriers. They discuss technical debt, lost context from turnover, and the dangers of 'if it ain’t broke.' Shaw advocates for composable architectures and planning exit strategies from day one. Additionally, she touches on integrating legacy systems into AI initiatives and the importance of transparency in data governance. A must-listen for anyone navigating modernization!
undefined
88 snips
Oct 11, 2025 • 52min

Context Engineering as a Discipline: Building Governed AI Analytics

Nick Schrock, CTO and founder of Dagster Labs, shares his insights on agentic analytics and the innovative Compass tool he developed. He explains how Compass transforms data teams into stewards of context while integrating seamlessly with Slack for enhanced collaboration. Schrock discusses the implications of agentic systems on Conway's Law and the need for new infrastructure to support these workflows. He also highlights cost control strategies and the future of context engineering in software development, unveiling his optimistic outlook on AI advancements.
undefined
134 snips
Oct 5, 2025 • 1h 1min

The Data Model That Captures Your Business: Metric Trees Explained

Vijay Subramanian, CEO of Trace and former data leader at Rent the Runway, dives into the revolutionary concept of metric trees as a data model that mirrors a company's business framework. He reveals how traditional dashboards often miss the mark and how metric trees can enhance analytical workflows by clarifying cause and effect. Vijay shares insights on leveraging these trees alongside AI agents for operational analytics and discusses real-world applications like modeling customer journeys. He also emphasizes the importance of collaboration with business teams to effectively implement this innovative approach.
undefined
18 snips
Sep 28, 2025 • 57min

From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra

Brijesh Tripathi, CEO of Flex AI, combines his rich background in AI and HPC architecture to revolutionize AI infrastructure. He discusses the burdens of DevOps that slow down small AI teams and highlights Flex AI's innovative workload-as-a-service approach. Brijesh breaks down the challenges of accessing heterogeneous compute, the importance of consistent Kubernetes layers, and how to smooth costs for spiky workloads. He also shares insights on handling real-time vs. best-effort workloads, maximizing utilization, and ensuring that AI teams can focus on creativity instead of complexity.
undefined
53 snips
Sep 18, 2025 • 53min

From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture

Mark Brooker, VP and Distinguished Engineer at AWS, dives into how agentic workflows are revolutionizing database infrastructure. He shares insights on why agents demand serverless, elastic databases and discusses the shift from traditional data models to vectors and relational databases. Mark explores the significance of tools like D-SQL for managing global agent workloads and highlights real-world applications, such as agent-driven SQL fuzzing. He also emphasizes the need for improved identity and authorization in our evolving data landscape.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app