The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Quantizing Transformers by Helping Attention Heads Do Nothing with Markus Nagel - #663

9 snips

Dec 26, 2023

In this discussion, Markus Nagel, a research scientist at Qualcomm AI Research, shares insights from his recent papers at NeurIPS 2023, focusing on machine learning efficiency. He tackles the challenges of quantizing transformers, particularly in minimizing outlier issues in attention mechanisms. The conversation explores the pros and cons of pruning versus quantization for model weight compression and dives into innovative methods for multitask and multidomain learning. Additionally, the use of geometric algebra in enhancing algorithms for robotics is highlighted.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

From CERN to Computer Vision

Markus Nagel's first ML project involved beam loss monitoring at CERN, home to the Large Hadron Collider.
His master's thesis in Amsterdam explored encoding diverse visual streams, which faced initial rejection at CVPR but later got accepted at BMVC.

ANECDOTE

Startup to Qualcomm Research

Markus Nagel's path to research included a startup, Cypher, co-founded by Professor Max Welling, Taco Cohen, and Timon Blankeford.
Acquired by Qualcomm in 2017, this led him back to academic research.

INSIGHT

Outliers in Transformer Quantization

Transformers are difficult to quantize due to outliers in activations, impacting the trade-off between clipping and rounding errors.
These outliers stem from attention heads trying to perform a "no update" behavior, which is not explicitly represented.

Get the Snipd Podcast app to discover more snips from this episode

Get the app