AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Complexity of the Large Language Model
GPT-3 training procedure is download a bunch of text from the internet and then make a model predict the next word. I think with this very simple next word prediction objective on like internet scale data what we have done is we have actually multitasked a bunch of different objectives. We're seeing all these interesting facets and phenomena emerge as we start to more optimally train these modelsScale them up understand better how to like you know train them.