Gnarly Data Waves by Dremio cover image

EP10 - Optimizing Data Files in Apache Iceberg Performance Strategies

Gnarly Data Waves by Dremio

00:00

The Benefits of Hidden Partitioning

partitioning allows us to do data skipping and helps ensure the performance part, right? So now, instead of looking into the entire table of data files for our query, our query engine can basically skip the data files in partition B and C. Our next optimization method is called compaction. When you're ingesting data frequently, specifically with streaming data, there may not be enough new data to create optimal data file sizes. This problem can lead to a lot of small number of files in your data lake.

Play episode from 15:22
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app