Training LLMs and Auto-Regressive Prediction

LLMs are trained by masking words in a text and training a neural net to predict the missing words. The neural net predicts the next word by generating a probability distribution over all possible words in the dictionary and sampling a word based on this distribution. This process is called auto-regressive prediction, which is why these LLMs are referred to as auto-regressive LLMs.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app