AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Analyzing Outliers in Quantization and Proposed Solutions
The speakers discuss the impact of outliers in quantization and their disruption in machine learning models. They explain the root cause of outliers and how they affect attention heads and value matrix. The chapter proposes two methods, clipped softmax and gated attention, to address the issue, comparing their performance and concluding that gated attention works better in all tested models.