
EP10 - Optimizing Data Files in Apache Iceberg Performance Strategies
Gnarly Data Waves by Dremio
00:00
The Importance of Data Lake Storage
The most important thing that makes up a data lake house is the table format. We have today, iceberg and other formats. But hive also had this a lot of performance issues,. correctness issues, and it wasn't really great for scale analytics. So these problems are common in every organization doing large-scale analytics. Netflix were kind of the main thinking behind putting a bandage on each individual problem. They can see more quickly from one environment to another with their Atlas system. It's just a time series based metric; they cannot see them all at once. And as you can see, within a few months or even weeks, things start going wrong.
Play episode from 03:52
Transcript


