The Complexity of the Large Language Model

GPT-3 training procedure is download a bunch of text from the internet and then make a model predict the next word. I think with this very simple next word prediction objective on like internet scale data what we have done is we have actually multitasked a bunch of different objectives. We're seeing all these interesting facets and phenomena emerge as we start to more optimally train these modelsScale them up understand better how to like you know train them.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app