GPT-2 - The Importance of Scalability

The earliest engagement with the hypothetical of what if self supervised sequence modeling actually works that I know of is a terse post from 2019, Implications of GPT-2 by Gherkin-Gloss. The best models are the largest we were able to fit into a GPU memory back to the text. If this passes, we could immediately implement an HCH of AI safety researchers solving the problem if it's within our reach at all.

Play episode from 09:57

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app