

Building a Data Lake with Adam Ferrari
Feb 6, 2024
Adam Ferrari, SVP of Engineering at Starburst, discusses building a Data Lake Analytics platform and the interesting work happening at Starburst. They explore the history and purpose of Starburst, the growth and interest in data lakes, and the challenges of building and maintaining a data lake. They also discuss the scalability, performance, and architecture of Trino, the open-source project that forms the foundation of Starburst. Finally, they highlight the challenges of managing a data lake, including integrating with streaming services and keeping up with evolving lake formats.
AI Snips
Chapters
Transcript
Episode notes
Starburst as Unified Data Platform
- Starburst is a data lake analytics platform designed for structured data at scale using Trino.
- It unifies querying across object storage and traditional structured databases with SQL.
Why Data Lakes Emerged
- Data lakes emerged due to massive scale needs beyond traditional data warehouses.
- They provide cheap, scalable storage with flexibility unlike the more rigid data warehouse model.
Presto Changed Data Access
- Adam recalls how early SQL-on-Hadoop systems were slow and unreliable.
- He found Presto (the precursor to Trino) transformative due to its speed and reliability.