2min chapter

Data Engineering Podcast cover image

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

Data Engineering Podcast

CHAPTER

Do You Have a Coordinated CDC Pattern?

The current state of the art for CDC patterns is try and do either the copy on right or merge on read approach. But in that case, you've got a Kafka topic that's partitioned across 15 different partitions. And who knows what points and time all those records are in? So if you randomly batch them up through like a flink or Kafka connect process, the downstream table actually has Like a completely random table state that doesn't coordinate with basically any time in the upstream table. That really fascinates me. I'm like, how is anyone having success with these CDC tables that are basically like arandom table state that is not actually a point in time? The industry needs to lead

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode