The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Quantizing Transformers by Helping Attention Heads Do Nothing with Markus Nagel - #663

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

00:00

Exploring Quantization in Transformers

This chapter investigates quantization methods for transformers, focusing on enhancing quantizability by tackling outlier issues. It discusses the implications of attention mechanisms and the challenges of updating token representations, comparing traditional softmax approaches with innovative gated attention methods. The findings reveal that while quantization generally yields better performance than pruning, specific scenarios may influence the choice between these techniques.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app