The Joe Reis Show

Why AI Agents Need a New Lakehouse. Ciro Greco (Bauplan) on “Git for Data”

Nov 26, 2025
Ciro Greco, Co-founder and CEO of Bauplan, dives into the future of data infrastructure with a focus on a 'Code-First' approach. He explains how traditional data stacks fail autonomous AI agents and why a programmable lakehouse is vital. Ciro introduces 'Git for Data' semantics, detailing how features like branching and transactionality create safe environments for agents to work without corrupting production data. He also shares insights on the evolving roles of data teams and the growing need for principled controls amid increasing automation.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Code-First Lakehouse Mentality

  • Treat the lakehouse as code-first: data should be managed like software with simple APIs and transactional guarantees.
  • Bauplan abstracts infra and runtimes so engineers can treat data engineering as code, not plumbing.
ADVICE

Enforce Git Semantics On Data

  • Use Git-like primitives (branching, isolation, transactionality) to make multi-table operations auditable and reversible.
  • Build runtime guarantees so pipelines behave like database transactions and fail safely in isolated branches.
ADVICE

Run Pipelines In Branch Sandboxes

  • Always run pipelines in isolated branches so partial failures never corrupt production data.
  • Inspect the branch's code, runtime, and data to debug and deterministically reproduce failures before merging.
Get the Snipd Podcast app to discover more snips from this episode
Get the app