
The Joe Reis Show Why AI Agents Need a New Lakehouse. Ciro Greco (Bauplan) on “Git for Data”
Nov 26, 2025
Ciro Greco, Co-founder and CEO of Bauplan, dives into the future of data infrastructure with a focus on a 'Code-First' approach. He explains how traditional data stacks fail autonomous AI agents and why a programmable lakehouse is vital. Ciro introduces 'Git for Data' semantics, detailing how features like branching and transactionality create safe environments for agents to work without corrupting production data. He also shares insights on the evolving roles of data teams and the growing need for principled controls amid increasing automation.
AI Snips
Chapters
Transcript
Episode notes
Code-First Lakehouse Mentality
- Treat the lakehouse as code-first: data should be managed like software with simple APIs and transactional guarantees.
- Bauplan abstracts infra and runtimes so engineers can treat data engineering as code, not plumbing.
Enforce Git Semantics On Data
- Use Git-like primitives (branching, isolation, transactionality) to make multi-table operations auditable and reversible.
- Build runtime guarantees so pipelines behave like database transactions and fail safely in isolated branches.
Run Pipelines In Branch Sandboxes
- Always run pipelines in isolated branches so partial failures never corrupt production data.
- Inspect the branch's code, runtime, and data to debug and deterministically reproduce failures before merging.
