The Distributional Hypothesis of Large Language Models

Large language models learn to represent words by playing this next word prediction game where they have an input sequence or context window. They try to predict which is the word that follows and then they adjust the vector representation for words accordingly. These representations of words as vectors in a high dimensional vector space very finely captures distributional properties of words.

Play episode from 01:19:01

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app