AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Exploring NESSI: A Git-like Versioned Catalog for Data Lakes
In this chapter, Tobias Macy interviews Alex Merced, a developer advocate for Dremio, about the NESSI project, a Git-like versioned catalog for data lakes using Apache Iceberg. They discuss NESSI's core functions, its comparison with lake FS, its role in data lakehouse environments, and its versioning and capability aspects, including integration with Apache Iceberg and maintenance tasks like pruning old versions and running table compactions.
Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. The primary purpose of the catalog is to inform the query engine of what data exists and where, but the Nessie project aims to go beyond that simple utility. In this episode Alex Merced explains how the branching and merging functionality in Nessie allows you to use the same versioning semantics for your data lakehouse that you are used to from Git.
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode