

Data Engineering Podcast
Tobias Macey
This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
Episodes
Mentioned books

18 snips
Sep 28, 2025 • 57min
From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra
Brijesh Tripathi, CEO of Flex AI, combines his rich background in AI and HPC architecture to revolutionize AI infrastructure. He discusses the burdens of DevOps that slow down small AI teams and highlights Flex AI's innovative workload-as-a-service approach. Brijesh breaks down the challenges of accessing heterogeneous compute, the importance of consistent Kubernetes layers, and how to smooth costs for spiky workloads. He also shares insights on handling real-time vs. best-effort workloads, maximizing utilization, and ensuring that AI teams can focus on creativity instead of complexity.

62 snips
Sep 18, 2025 • 53min
From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture
Mark Brooker, VP and Distinguished Engineer at AWS, dives into how agentic workflows are revolutionizing database infrastructure. He shares insights on why agents demand serverless, elastic databases and discusses the shift from traditional data models to vectors and relational databases. Mark explores the significance of tools like D-SQL for managing global agent workloads and highlights real-world applications, such as agent-driven SQL fuzzing. He also emphasizes the need for improved identity and authorization in our evolving data landscape.

66 snips
Sep 10, 2025 • 1h 11min
Duck Lake: Simplifying the Lakehouse Ecosystem
Hannes Mühleisen and Mark Raasveldt, key figures behind DuckDB, dive into their latest project, Duck Lake, aiming to simplify the lakehouse ecosystem. They discuss how Duck Lake stands out with its unified SQL database, making metadata management a breeze. The duo shares their vision for decentralized processing, local-first data architecture, and benefits like data inlining and encryption. They also touch on its seamless integration with existing systems, showcasing how it can transform data workflows and enhance user experiences.

84 snips
Sep 1, 2025 • 1h 7min
Aligning Business and Data: The Essential Role of Data Modeling
Serge Gershkovich, Head of Product at SQL DBM and a Snowflake data expert, dives into the socio-technical aspects of data modeling. He emphasizes that effective data modeling is crucial for aligning business needs with technical structures, debunking myths about its importance. The discussion explores challenges in complex environments and the evolving role of AI in data management. Serge advocates for collaboration between business teams and data professionals, highlighting how clear communication can enhance trust and mitigate issues related to data quality.

45 snips
Aug 26, 2025 • 51min
From Academia to Industry: Bridging Data Engineering Challenges
In this engaging discussion, Professor Paul Groth from the University of Amsterdam shares his expertise in AI systems and intelligent data engineering. He dives into the evolution of data provenance and lineage, illustrating its significance in today's workflows. Paul also highlights the transformative impact of large language models on knowledge graph construction and data integration. The conversation addresses the synergy between academia and industry, emphasizing human-AI collaboration and the need for tailored data management solutions.

14 snips
Aug 18, 2025 • 1h 1min
High Performance And Low Overhead Graphs With KuzuDB
Prashanth Rao, an AI engineer at KuzuDB, delves into the cutting-edge features of their embeddable graph database. He explains how KuzuDB tackles performance issues with innovative columnar storage and unique join algorithms. The conversation reveals KuzuDB's potential for enhancing graph applications, especially in edge computing and ephemeral workloads. Prashanth also discusses the growing interest in graph databases for AI integration and how Kuzu can seamlessly work with other data formats like Iceberg and Parquet.

116 snips
Aug 12, 2025 • 1h 11min
Bridging Data and Decision-Making: AI's Role in Modern Analytics
Lucas Thelosen and Drew Gilson, co-founders of Gravity, delve into the transformative impact of AI in data analytics. They discuss their creation of Orion, an autonomous data analyst designed to bridge data and decision-making. The conversation highlights how AI democratizes access to data insights for businesses of all sizes, allowing data analysts to focus on strategic tasks. They also emphasize the importance of accuracy and trustworthiness in AI-driven workflows, sharing insights on how companies can cultivate a data-driven culture.

71 snips
Aug 5, 2025 • 50min
From Bits to Tables: The Evolution of S3 Storage
In this discussion, Andy Warfield, an Amazon storage enhancement expert, dives into the evolution of S3 storage. He explores the revolutionary functionalities of S3 Tables and Vectors, crucial for modern data management and analytics. Andy shares insights on how customer feedback has shaped these developments, improving performance for AI workloads. He also discusses the innovative applications of these features in industries like genomics and finance, along with the technical challenges faced in integrating advanced data types.

53 snips
Jul 28, 2025 • 52min
Revolutionizing Python Notebooks with Marimo
In this conversation, Akshay Agrawal, Co-founder and CEO of Marimo, introduces a groundbreaking open-source Python notebook environment. He tackles the drawbacks of traditional Jupyter notebooks, such as hidden states, and showcases Marimo’s reactive execution model and improved interactivity. Akshay also discusses the tool's capability to seamlessly integrate data apps, compared to other platforms like Jupyter and Streamlit. The talk highlights the technical architecture, community-driven development, and exciting future plans, including AI enhancements, aiming to revolutionize data workflows.

20 snips
Jul 21, 2025 • 55min
Warehouse Native Incremental Data Processing With Dynamic Tables And Delayed View Semantics
Dan Sotolongo, a principal engineer at Snowflake, shares insights on simplifying data engineering through incremental data processing and delayed view semantics. He dives into the complexities of managing evolving datasets in cloud warehouses, discussing how these concepts optimize resource use and reduce latency. The conversation contrasts traditional batch systems with dynamic tables and streaming solutions, emphasizing the need for a unified framework for semantic guarantees in data pipelines, and highlights the ongoing innovations in data integration and maintenance.


