Data Engineering Podcast

StarRocks: Bridging Lakehouse and OLAP for High-Performance Analytics

9 snips
May 5, 2025
Sida Shen, a product manager at CelerData and a contributor to StarRocks, dives into the innovative world of high-performance analytical databases. He shares the origins of StarRocks, illustrating its evolution from Apache Doris into a robust Lakehouse query engine. Topics include handling high concurrency and low latency queries, bridging traditional OLAP with lakehouse architecture, and the importance of integration with formats like Apache Iceberg. Sida also emphasizes the challenges of denormalization and real-time data processing in modern analytics.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

StarRocks Lakehouse Innovation

  • StarRocks is a high-performance Lakehouse query engine forked from Apache Doris in 2020.
  • It supports on-the-fly joins and eliminates the need for denormalization pipelines for flexible, fast queries.
INSIGHT

Bridging Lakehouse and OLAP

  • StarRocks supports fast joins on the fly, solving denormalization challenges common in other OLAP systems.
  • It bridges Lakehouse open formats with low latency, high concurrency typical of data warehouses.
ADVICE

Simplify Scaling and Schema Design

  • Use automatic sharding in StarRocks to avoid manual data redistributions when scaling.
  • Opt for columnar format storage over indexes for better OLAP performance in most cases.
Get the Snipd Podcast app to discover more snips from this episode
Get the app