Changelog Master Feed cover image

GitHub and Google on Public Datasets & Google BigQuery (Changelog Interviews #209)

Changelog Master Feed

00:00

BigQuery's Big Data Tables

The GitHub API spews out all these events, like 100 and 100 per second of public public client records. And so what we're adding to that is the actual content that these events describe. So for every commit now has the full message that people were modifying or removing from a repository. There's another table which has all the file contents. This is about three terabytes of raw data in BigQuery tables.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app