Data Archives - Software Engineering Daily

Building a Data Lake with Adam Ferrari

Feb 6, 2024
Adam Ferrari, SVP of Engineering at Starburst, discusses building a Data Lake Analytics platform and the interesting work happening at Starburst. They explore the history and purpose of Starburst, the growth and interest in data lakes, and the challenges of building and maintaining a data lake. They also discuss the scalability, performance, and architecture of Trino, the open-source project that forms the foundation of Starburst. Finally, they highlight the challenges of managing a data lake, including integrating with streaming services and keeping up with evolving lake formats.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Starburst as Unified Data Platform

  • Starburst is a data lake analytics platform designed for structured data at scale using Trino.
  • It unifies querying across object storage and traditional structured databases with SQL.
INSIGHT

Why Data Lakes Emerged

  • Data lakes emerged due to massive scale needs beyond traditional data warehouses.
  • They provide cheap, scalable storage with flexibility unlike the more rigid data warehouse model.
ANECDOTE

Presto Changed Data Access

  • Adam recalls how early SQL-on-Hadoop systems were slow and unreliable.
  • He found Presto (the precursor to Trino) transformative due to its speed and reliability.
Get the Snipd Podcast app to discover more snips from this episode
Get the app