The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Quantizing Transformers by Helping Attention Heads Do Nothing with Markus Nagel - #663

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

CHAPTER

Exploring Quantization in Transformers

This chapter investigates quantization methods for transformers, focusing on enhancing quantizability by tackling outlier issues. It discusses the implications of attention mechanisms and the challenges of updating token representations, comparing traditional softmax approaches with innovative gated attention methods. The findings reveal that while quantization generally yields better performance than pruning, specific scenarios may influence the choice between these techniques.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner