

Quantizing Transformers by Helping Attention Heads Do Nothing with Markus Nagel - #663
9 snips Dec 26, 2023
In this discussion, Markus Nagel, a research scientist at Qualcomm AI Research, shares insights from his recent papers at NeurIPS 2023, focusing on machine learning efficiency. He tackles the challenges of quantizing transformers, particularly in minimizing outlier issues in attention mechanisms. The conversation explores the pros and cons of pruning versus quantization for model weight compression and dives into innovative methods for multitask and multidomain learning. Additionally, the use of geometric algebra in enhancing algorithms for robotics is highlighted.
AI Snips
Chapters
Transcript
Episode notes
From CERN to Computer Vision
- Markus Nagel's first ML project involved beam loss monitoring at CERN, home to the Large Hadron Collider.
- His master's thesis in Amsterdam explored encoding diverse visual streams, which faced initial rejection at CVPR but later got accepted at BMVC.
Startup to Qualcomm Research
- Markus Nagel's path to research included a startup, Cypher, co-founded by Professor Max Welling, Taco Cohen, and Timon Blankeford.
- Acquired by Qualcomm in 2017, this led him back to academic research.
Outliers in Transformer Quantization
- Transformers are difficult to quantize due to outliers in activations, impacting the trade-off between clipping and rounding errors.
- These outliers stem from attention heads trying to perform a "no update" behavior, which is not explicitly represented.