
DataNation - Podcast for Data Engineers, Analysts and Scientists Data News: DuckLake, Confluent’s TableFlow, New Book!
May 27, 2025
Dive into the fascinating world of Duck Lake, where table formats like Iceberg and Hudi collide. Discover the pros and cons of file-based versus database-backed catalogs and how they impact performance. Learn about Confluent's TableFlow and its role in streaming data into Iceberg catalogs. Explore the trade-offs between innovation and integration risks within the catalog ecosystem. Plus, find out why starting with a data lakehouse can be a game-changer for your data strategy!
AI Snips
Chapters
Transcript
Episode notes
Database-Backed Table Format
- Duck Lake reimagines table formats by putting metadata/catalog inside a SQL database rather than files on object storage.
- This makes metadata queries part of a running database instead of lightweight file-based catalogs, changing trade-offs around cost and scaling.
Centralized Planning Cuts Calls But Adds Contention
- Centralizing scan planning in a database can reduce repeated S3 metadata requests by answering queries from block storage.
- But consolidating planning may concentrate compute and create new contention points compared with file-based parallel planning.
Test Scalability Before Adopting Duck Lake
- Evaluate scaling behavior before choosing Duck Lake: test how scan planning and metadata load behave under concurrent workloads.
- Prefer file-based metadata/catalogs when you need many independent engines to plan queries concurrently at large scale.
