AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Getting Traction in the MLP Layers
The researchers have had a lot of success understanding the attention mechanism in Transformers. They are focusing their next thread of research on what is going on in the MLP layers. And so we're going back to basics and exploring, can we understand why they're challenging? We have another paper that will be coming out sometime soon where we look at some other toy models that aren't transformers at all, but very toy models.