Programming Throwdown

172: Transformers and Large Language Models

Mar 11, 2024
Exploring transformers and large language models in neural networks, focusing on latent variables, encoders, decoders, attention layers, and the history of RNNs. Discussing the vanishing gradient problem in LSTM and self-supervised learning with direct policy optimization.
Ask episode
Chapters
Transcript
Episode notes