Machine Learning Street Talk (MLST)

Transformers Need Glasses! - Federico Barbero

72 snips
Mar 8, 2025
Federico Barbero, a lead author at DeepMind/Oxford, dives into the quirks of transformers and why large language models falter at tasks like counting. He reveals fascinating architectural bottlenecks that affect their performance. By drawing parallels with graph neural networks, he sheds light on the softmax function's role in limiting decision-making clarity. But not all hope is lost! Federico shares innovative 'glasses' to enhance transformer performance, including input tweaks and structural modifications to boost their clarity and efficiency.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Token Detection Weakness

  • Transformers struggle to detect single tokens within long sequences.
  • This becomes increasingly problematic as context length grows.
INSIGHT

Representation Collapse

  • Transformer representations collapse when sequences grow, causing indistinguishable representations.
  • Limited machine precision exacerbates this issue, leading to forced errors.
ANECDOTE

Copying Task Experiment

  • The experiment involved generating sequences of ones and zeros and asking the model to copy the last element.
  • Surprisingly, as the sequence grew, the model developed a blind spot for the most recent element.
Get the Snipd Podcast app to discover more snips from this episode
Get the app