Software Engineering Daily

Iceberg at Netflix and Beyond with Ryan Blue

Mar 7, 2024

47:37

Snipd AI

Discover the origin and benefits of Apache Iceberg, a format for managing big data tables efficiently. Learn about Iceberg's collaboration with industry giants like Apple, Airbnb, and Lyft. Dive into the challenges of data migration and schema evolution in the realm of data management with Iceberg.

AI Summary

Highlights

AI Chapters

Episode notes

Podcast summary created with Snipd AI

Quick takeaways

Apache Iceberg simplifies big data analysis by enabling SQL tables usage with Spark and Hive concurrently.

Iceberg ensures correctness and efficiency by providing atomic operations for schema management and file layout enhancements.

Deep dives

Overview of Apache Iceberg and its Origins

Apache Iceberg is an open-source high-performance format for huge data tables born out of Netflix by Ryan Blue and Dan Weeks. It facilitates SQL table use for big data, enabling engines like Spark and Hive to work safely with the same tables concurrently. Since being open-sourced, companies like Airbnb, Apple, and Lyft have adopted Iceberg.

Optimizing Database Operations with Iceberg and Spark

02:33

Hive vs. Iceberg in Data Processing

00:44

Indirection for SQL Schema Evolution in Iceberg

01:30

Git Model and Database Transactions

01:42

Introduction

2min

Exploring Apache Iceberg: A Higher-Level Data Management Solution

9min

Optimization and Compatibility in Analytics Data Management

2min

Exploring Scheme Evolution in Data Management with Unique IDs and Indirection

2min

Exploring Data Versioning and Time Travel in Immutable File Systems

3min

Comparison of Git and Iceberg Data Models for Petabyte Tables

3min

Exploring Data Changes with Iceberg

9min

Challenges and Solutions in Data Migration

18min

Apache Iceberg is an open source high-performance format for huge data tables. Iceberg enables the use of SQL tables for big data, while making it possible for engines like Spark and Hive to safely work with the same tables, at the same time.

Iceberg was started at Netflix by Ryan Blue and Dan Weeks, and was open-sourced and donated to the Apache Software Foundation in November 2018. It has now been adopted at many other companies including Airbnb, Apple, and Lyft.

Ryan Blue joins the podcast to describe the origins of Iceberg, how it works, the problems it solves, collaborating with Apple and others to open-source it, and more.

This episode is hosted by Lee Atchison. Lee Atchison is a software architect, author, and thought leader on cloud computing and application modernization. His best-selling book, Architecting for Scale (O’Reilly Media), is an essential resource for technical teams looking to maintain high availability and manage risk in their cloud environments.

Lee is the host of his podcast, Modern Digital Business, an engaging and informative podcast produced for people looking to build and grow their digital business with the help of modern applications and processes developed for today’s fast-moving business environment. Listen at mdb.fm. Follow Lee at softwarearchitectureinsights.com, and see all his content at leeatchison.com.

Please click here to see the transcript of this episode.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

The post Iceberg at Netflix and Beyond with Ryan Blue appeared first on Software Engineering Daily.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Software Engineering Daily

Iceberg at Netflix and Beyond with Ryan Blue

Podcast summary created with Snipd AI

Quick takeaways

Deep dives

Overview of Apache Iceberg and its Origins

Functionality of Iceberg as a Table Format

Evolution from Hive to Iceberg

Transactional Guarantees and Migration Challenges

Optimizing Database Operations with Iceberg and Spark

Hive vs. Iceberg in Data Processing

Indirection for SQL Schema Evolution in Iceberg

Git Model and Database Transactions

Get the Snipd
podcast app

AI-powered
podcast player

Discover
highlights

Save any
moment

Share
& Export

AI-powered
podcast player

Discover
highlights

Software Engineering Daily

Iceberg at Netflix and Beyond with Ryan Blue

Podcast summary created with Snipd AI

Quick takeaways

Deep dives

Overview of Apache Iceberg and its Origins

Functionality of Iceberg as a Table Format

Evolution from Hive to Iceberg

Transactional Guarantees and Migration Challenges

Optimizing Database Operations with Iceberg and Spark

Hive vs. Iceberg in Data Processing

Indirection for SQL Schema Evolution in Iceberg

Git Model and Database Transactions

Get the Snipdpodcast app

AI-poweredpodcast player

Discoverhighlights

Save anymoment

Share& Export

AI-poweredpodcast player

Discoverhighlights

Get the Snipd
podcast app

AI-powered
podcast player

Discover
highlights

Save any
moment

Share
& Export

AI-powered
podcast player

Discover
highlights