

Being Data Driven At Stripe With Trino And Iceberg
61 snips Jun 16, 2024
Learn how Stripe utilizes Trino and Iceberg for their data lakehouse, including insights on business analytics, challenges with large datasets, optimizing with Iceberg, and transitioning to REST catalog. Discover the advantages of monitoring queries and managing multi-tool ecosystems with Trino and Spark. Explore the challenges and innovations in cloud data management with Trino and Iceberg at Stripe.
AI Snips
Chapters
Transcript
Episode notes
Redshift to Trino Migration
- Stripe migrated from Redshift to Trino to handle increasing data scale and concurrency.
- Redshift struggled with thousands of concurrent queries, a bottleneck Trino's distributed nature overcame.
Hive vs. Iceberg
- Hive's partitioning and S3 listing are costly and inefficient for big data.
- Iceberg addresses these limitations, making it a superior choice for large datasets on blob storage.
REST Catalog for Migration
- Implement a REST catalog on top of your existing Hive Metastore.
- This allows transparent migration to Iceberg without disrupting users.