
Data Brew by Databricks
Data Brew Season 1 Episode 3: Demystifying Delta Lake
Dec 6, 2020
In this podcast, Michael Armbrust, the creator of Spark SQL, discusses the conception and evolution of Delta Lake, efficient querying and troubleshooting slow queries, optimizing performance and query speed, understanding partitioning and Z Order, and exciting features for data ingestion and schema handling in Delta Lake.
25:51
Episode guests
AI Summary
AI Chapters
Episode notes
Podcast summary created with Snipd AI
Quick takeaways
- Delta Lake offers full ACID transactions and time travel for reproducibility.
- Delta Lake enhances Spark's streaming file sync with a protocol for full ACID transactions.
Deep dives
Evolution from Data Lakes to Delta Lake
Michael Armbrust discusses the engineering journey from building data lakes to the development of Delta Lake. He highlights the struggles faced by users when using Spark in the cloud and the scalability limitations around metadata. The idea of a scalable transaction log was born from these challenges, and its potential was realized when one of Databricks' largest customers proposed the idea of ingesting petabytes of data per week into a table for real-time querying. This led to the development of Delta Lake, which offers features like full ACID transactions and time travel for reproducibility.
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.