
Episode 538: Roberto Di Cosmo on Archiving Public Software at Massive Scale
Software Engineering Radio - the podcast for professional software developers
00:00
How to Scale Up a Project on Gitadra
Gitadra uses a Merkle graph to keep track of files and directories. It can spot when two file contents are the same, for example. Using this properties, using this cryptographic identifier, they spot exactly what is going on. We actually manage to compress and deduplicate everything at all levels. So if a file is used in different projects, we keep it on you once. Instead of 300 petabytes, we have only one petabyte by avoiding copying and duplicating the same file over and over again.
Transcript
Play full episode