
GitHub and Google on Public Datasets & Google BigQuery (Changelog Interviews #209)
Changelog Master Feed
00:00
How to Capture Data From GitHub's Public API
In 2016, we had a similar problem, just bigger. It starts by taking a mirror of GitHub, using their public API and looking at GitHub's changed story history. Once you start mirroring this, you have a lot of files. And then the question becomes, where do I store them? Where can I analyze them?Where can I share them with other people? That's where Sean Pierce is a superstar that writes this pipeline to take one mirror GitHub and then put it inside BigQuery as relational tables.
Transcript
Play full episode