DataNation - Podcast for Data Engineers, Analysts and Scientists

Data News: DuckLake, Confluent’s TableFlow, New Book!

May 27, 2025

Dive into the fascinating world of Duck Lake, where table formats like Iceberg and Hudi collide. Discover the pros and cons of file-based versus database-backed catalogs and how they impact performance. Learn about Confluent's TableFlow and its role in streaming data into Iceberg catalogs. Explore the trade-offs between innovation and integration risks within the catalog ecosystem. Plus, find out why starting with a data lakehouse can be a game-changer for your data strategy!

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Database-Backed Table Format

Duck Lake reimagines table formats by putting metadata/catalog inside a SQL database rather than files on object storage.
This makes metadata queries part of a running database instead of lightweight file-based catalogs, changing trade-offs around cost and scaling.

INSIGHT

Centralized Planning Cuts Calls But Adds Contention

Centralizing scan planning in a database can reduce repeated S3 metadata requests by answering queries from block storage.
But consolidating planning may concentrate compute and create new contention points compared with file-based parallel planning.

ADVICE

Test Scalability Before Adopting Duck Lake

Evaluate scaling behavior before choosing Duck Lake: test how scan planning and metadata load behave under concurrent workloads.
Prefer file-based metadata/catalogs when you need many independent engines to plan queries concurrently at large scale.

Get the Snipd Podcast app to discover more snips from this episode

Get the app