Gnarly Data Waves by Dremio cover image

EP10 - Optimizing Data Files in Apache Iceberg Performance Strategies

Gnarly Data Waves by Dremio

00:00

The Importance of Data Lake Storage

The most important thing that makes up a data lake house is the table format. We have today, iceberg and other formats. But hive also had this a lot of performance issues,. correctness issues, and it wasn't really great for scale analytics. So these problems are common in every organization doing large-scale analytics. Netflix were kind of the main thinking behind putting a bandage on each individual problem. They can see more quickly from one environment to another with their Atlas system. It's just a time series based metric; they cannot see them all at once. And as you can see, within a few months or even weeks, things start going wrong.

Play episode from 03:52
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app