The Analytics Engineering Podcast

Under the hood of Apache Iceberg (w/ Christian Thiel)

47 snips
Aug 24, 2025
Christian Thiel, co-founder of Lakekeeper, dives into the fascinating world of Apache Iceberg, a leading data management tool. He discusses its evolving ecosystem, addressing challenges in data architecture and the importance of timely data for machine learning. The conversation explores data access mechanisms, secure credential management, and the innovative features improving enterprise readiness. Thiel also highlights the flexibility of permission models and the role of Lakekeeper in enhancing data collaboration and integrity.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

From NLP To Building Lakekeeper

  • Christian moved from NLP research into industrial ML and then into data engineering because production projects lacked timely, high-quality data.
  • That practical pain motivated him to build Lakekeeper and focus on Iceberg catalogs.
INSIGHT

Iceberg Enables Compute-Storage Separation

  • Apache Iceberg separates storage and compute by providing a shared metadata layer that multiple query engines can use.
  • This enables one canonical copy of data while swapping compute engines to avoid vendor lock-in.
INSIGHT

Metadata Hierarchy Is The Core

  • Iceberg is fundamentally a metadata hierarchy sitting on top of file formats like Parquet.
  • The top-level table metadata JSON is the authoritative state that catalogs must manage centrally.
Get the Snipd Podcast app to discover more snips from this episode
Get the app