

Under the hood of Apache Iceberg (w/ Christian Thiel)
47 snips Aug 24, 2025
Christian Thiel, co-founder of Lakekeeper, dives into the fascinating world of Apache Iceberg, a leading data management tool. He discusses its evolving ecosystem, addressing challenges in data architecture and the importance of timely data for machine learning. The conversation explores data access mechanisms, secure credential management, and the innovative features improving enterprise readiness. Thiel also highlights the flexibility of permission models and the role of Lakekeeper in enhancing data collaboration and integrity.
AI Snips
Chapters
Transcript
Episode notes
From NLP To Building Lakekeeper
- Christian moved from NLP research into industrial ML and then into data engineering because production projects lacked timely, high-quality data.
- That practical pain motivated him to build Lakekeeper and focus on Iceberg catalogs.
Iceberg Enables Compute-Storage Separation
- Apache Iceberg separates storage and compute by providing a shared metadata layer that multiple query engines can use.
- This enables one canonical copy of data while swapping compute engines to avoid vendor lock-in.
Metadata Hierarchy Is The Core
- Iceberg is fundamentally a metadata hierarchy sitting on top of file formats like Parquet.
- The top-level table metadata JSON is the authoritative state that catalogs must manage centrally.