Talk Python To Me cover image

#491: DuckDB and Python: Ducks and Snakes living together

Talk Python To Me

00:00

Columnar Database Indexing

Columnar databases like DuckDB automatically create rough indexes approximately every 100,000 rows. These indexes store the minimum and maximum values within each chunk, enabling efficient filtering. When querying for specific data ranges (e.g., data from this week), DuckDB checks which blocks contain relevant data based on these min/max values. This method allows skipping irrelevant data, significantly speeding up analytical queries that analyze trends in large datasets. While not row-level indexes, this approach is generally optimal for analytical workloads. An external, adaptive radix tri index (art index) can be added for more specific indexing needs.

Play episode from 51:10
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app