Changelog Master Feed cover image

GitHub and Google on Public Datasets & Google BigQuery (Changelog Interviews #209)

Changelog Master Feed

00:00

How to Capture Data From GitHub's Public API

In 2016, we had a similar problem, just bigger. It starts by taking a mirror of GitHub, using their public API and looking at GitHub's changed story history. Once you start mirroring this, you have a lot of files. And then the question becomes, where do I store them? Where can I analyze them?Where can I share them with other people? That's where Sean Pierce is a superstar that writes this pipeline to take one mirror GitHub and then put it inside BigQuery as relational tables.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app