Confluent Developer ft. Tim Berglund, Adi Polak & Viktor Gamov cover image

Git for Data: Managing Data like Code with lakeFS

Confluent Developer ft. Tim Berglund, Adi Polak & Viktor Gamov

00:00

Git's Data Model

When we're storing code source, we're talking about megabytes. Merkle tree is a cryptographic tree. So basically what it knows about the data is if something got changed because it stores hashes. After writing the data back to our storage, this is when basically copy and write takes place. This is when the Merkle trees are being updated. Something happened with the storage with the way it's represented on disk. And so this is when we know this file got changed. It's relatively small. Sometimes binaries can be a little bit larger because the whole process that's going on to produce the binaries. Mostly. It's a nicely compressible text. Yeah.

Play episode from 05:35
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app