Transformers Need Glasses! - Federico Barbero

72 snips

Mar 8, 2025

Federico Barbero, a lead author at DeepMind/Oxford, dives into the quirks of transformers and why large language models falter at tasks like counting. He reveals fascinating architectural bottlenecks that affect their performance. By drawing parallels with graph neural networks, he sheds light on the softmax function's role in limiting decision-making clarity. But not all hope is lost! Federico shares innovative 'glasses' to enhance transformer performance, including input tweaks and structural modifications to boost their clarity and efficiency.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Token Detection Weakness

Transformers struggle to detect single tokens within long sequences.
This becomes increasingly problematic as context length grows.

INSIGHT

Representation Collapse

Transformer representations collapse when sequences grow, causing indistinguishable representations.
Limited machine precision exacerbates this issue, leading to forced errors.

ANECDOTE

Copying Task Experiment

The experiment involved generating sequences of ones and zeros and asking the model to copy the last element.
Surprisingly, as the sequence grew, the model developed a blind spot for the most recent element.

Get the Snipd Podcast app to discover more snips from this episode

Get the app