Data Engineering Podcast cover image

Data Engineering Podcast

Version Your Data Lakehouse Like Your Software With Nessie

Mar 10, 2024
Learn how the Nessie project bridges data lake and warehouse capabilities with versioning semantics similar to Git. Explore effective versioning and branching strategies, architecture of Nessie, and future development plans. Discover the advantages of using Nessie for data versioning in multi-table transactions within a data lakehouse setting.
40:55

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • Nessie provides Git-like versioning for data lakes, enabling disaster recovery and new data ops practices.
  • Nessie simplifies data rollback and branch environments, streamlining testing and reducing storage costs.

Deep dives

Nessie Project Overview

Nessie is a Git-like versioned catalog for data lakes using Apache Iceberg. It provides branching and commit capabilities at the catalog level, enabling Git-like semantics for data ops practices such as disaster recovery. Nessie's key feature is creating branches and commits at the catalog level, changing how developers interact with the data and facilitating new data lake house patterns.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode