The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Quantizing Transformers by Helping Attention Heads Do Nothing with Markus Nagel - #663

9 snips
Dec 26, 2023
In this discussion, Markus Nagel, a research scientist at Qualcomm AI Research, shares insights from his recent papers at NeurIPS 2023, focusing on machine learning efficiency. He tackles the challenges of quantizing transformers, particularly in minimizing outlier issues in attention mechanisms. The conversation explores the pros and cons of pruning versus quantization for model weight compression and dives into innovative methods for multitask and multidomain learning. Additionally, the use of geometric algebra in enhancing algorithms for robotics is highlighted.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

From CERN to Computer Vision

  • Markus Nagel's first ML project involved beam loss monitoring at CERN, home to the Large Hadron Collider.
  • His master's thesis in Amsterdam explored encoding diverse visual streams, which faced initial rejection at CVPR but later got accepted at BMVC.
ANECDOTE

Startup to Qualcomm Research

  • Markus Nagel's path to research included a startup, Cypher, co-founded by Professor Max Welling, Taco Cohen, and Timon Blankeford.
  • Acquired by Qualcomm in 2017, this led him back to academic research.
INSIGHT

Outliers in Transformer Quantization

  • Transformers are difficult to quantize due to outliers in activations, impacting the trade-off between clipping and rounding errors.
  • These outliers stem from attention heads trying to perform a "no update" behavior, which is not explicitly represented.
Get the Snipd Podcast app to discover more snips from this episode
Get the app