
759: Full Encoder-Decoder Transformers Fully Explained, with Kirill Eremenko
Super Data Science: ML & AI Podcast with Jon Krohn
00:00
Comparison of Encoder-only Models and Decoder-only Models in Transformers
The chapter explores the distinctions between encoder-only models like BERT and decoder-only models like GPT, emphasizing how BERT focuses on text representation for classification tasks, while GPT generates text. It discusses the significance of masking in generative tasks like predicting stock prices to prevent memorization, the advantages of full encoder-decoder transformers for classification, and technical aspects like layer stacking and masking in transformer models.
Transcript
Play full episode