DataNation - Podcast for Data Engineers, Analysts and Scientists

Data News: DuckLake, Confluent’s TableFlow, New Book!

May 27, 2025
Dive into the fascinating world of Duck Lake, where table formats like Iceberg and Hudi collide. Discover the pros and cons of file-based versus database-backed catalogs and how they impact performance. Learn about Confluent's TableFlow and its role in streaming data into Iceberg catalogs. Explore the trade-offs between innovation and integration risks within the catalog ecosystem. Plus, find out why starting with a data lakehouse can be a game-changer for your data strategy!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Database-Backed Table Format

  • Duck Lake reimagines table formats by putting metadata/catalog inside a SQL database rather than files on object storage.
  • This makes metadata queries part of a running database instead of lightweight file-based catalogs, changing trade-offs around cost and scaling.
INSIGHT

Centralized Planning Cuts Calls But Adds Contention

  • Centralizing scan planning in a database can reduce repeated S3 metadata requests by answering queries from block storage.
  • But consolidating planning may concentrate compute and create new contention points compared with file-based parallel planning.
ADVICE

Test Scalability Before Adopting Duck Lake

  • Evaluate scaling behavior before choosing Duck Lake: test how scan planning and metadata load behave under concurrent workloads.
  • Prefer file-based metadata/catalogs when you need many independent engines to plan queries concurrently at large scale.
Get the Snipd Podcast app to discover more snips from this episode
Get the app