Data Brew by Databricks cover image

Data Brew by Databricks

Data Brew Season 1 Episode 3: Demystifying Delta Lake

Dec 6, 2020
In this podcast, Michael Armbrust, the creator of Spark SQL, discusses the conception and evolution of Delta Lake, efficient querying and troubleshooting slow queries, optimizing performance and query speed, understanding partitioning and Z Order, and exciting features for data ingestion and schema handling in Delta Lake.
25:51

Podcast summary created with Snipd AI

Quick takeaways

  • Delta Lake offers full ACID transactions and time travel for reproducibility.
  • Delta Lake enhances Spark's streaming file sync with a protocol for full ACID transactions.

Deep dives

Evolution from Data Lakes to Delta Lake

Michael Armbrust discusses the engineering journey from building data lakes to the development of Delta Lake. He highlights the struggles faced by users when using Spark in the cloud and the scalability limitations around metadata. The idea of a scalable transaction log was born from these challenges, and its potential was realized when one of Databricks' largest customers proposed the idea of ingesting petabytes of data per week into a table for real-time querying. This led to the development of Delta Lake, which offers features like full ACID transactions and time travel for reproducibility.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner