Data Engineering Podcast

Being Data Driven At Stripe With Trino And Iceberg

61 snips
Jun 16, 2024
Learn how Stripe utilizes Trino and Iceberg for their data lakehouse, including insights on business analytics, challenges with large datasets, optimizing with Iceberg, and transitioning to REST catalog. Discover the advantages of monitoring queries and managing multi-tool ecosystems with Trino and Spark. Explore the challenges and innovations in cloud data management with Trino and Iceberg at Stripe.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Redshift to Trino Migration

  • Stripe migrated from Redshift to Trino to handle increasing data scale and concurrency.
  • Redshift struggled with thousands of concurrent queries, a bottleneck Trino's distributed nature overcame.
INSIGHT

Hive vs. Iceberg

  • Hive's partitioning and S3 listing are costly and inefficient for big data.
  • Iceberg addresses these limitations, making it a superior choice for large datasets on blob storage.
ADVICE

REST Catalog for Migration

  • Implement a REST catalog on top of your existing Hive Metastore.
  • This allows transparent migration to Iceberg without disrupting users.
Get the Snipd Podcast app to discover more snips from this episode
Get the app