
Peter & Boris — Fine-tuning OpenAI's GPT-3
Gradient Dissent: Conversations on AI
00:00
Train Your Toganizer for Different Languages?
A token is a sort ofthin enow rnt that we have about 50 thousand of these token a. We map them on to sequences of characters, so that it ends up being a common word like hi or the ends not being one token. That just makes it easier and more efficient for these language models to consume text. Inin principle, you can actually do it character level as well. Is jusd gets very inefficient. But i would think that might make foreign languages really hard. Like, for example, would asian languages be impossible then, if they have far more m tokens? Or i guess maybe you cold argue they've sort of done the tokenisation for you by
Transcript
Play full episode