Data Archives - Software Engineering Daily

Building a Data Lake with Adam Ferrari

Feb 6, 2024

Adam Ferrari, SVP of Engineering at Starburst, discusses building a Data Lake Analytics platform and the interesting work happening at Starburst. They explore the history and purpose of Starburst, the growth and interest in data lakes, and the challenges of building and maintaining a data lake. They also discuss the scalability, performance, and architecture of Trino, the open-source project that forms the foundation of Starburst. Finally, they highlight the challenges of managing a data lake, including integrating with streaming services and keeping up with evolving lake formats.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Starburst as Unified Data Platform

Starburst is a data lake analytics platform designed for structured data at scale using Trino.
It unifies querying across object storage and traditional structured databases with SQL.

INSIGHT

Why Data Lakes Emerged

Data lakes emerged due to massive scale needs beyond traditional data warehouses.
They provide cheap, scalable storage with flexibility unlike the more rigid data warehouse model.

ANECDOTE

Presto Changed Data Access

Adam recalls how early SQL-on-Hadoop systems were slow and unreliable.
He found Presto (the precursor to Trino) transformative due to its speed and reliability.

Get the Snipd Podcast app to discover more snips from this episode

Get the app