Getting Traction in the MLP Layers

The researchers have had a lot of success understanding the attention mechanism in Transformers. They are focusing their next thread of research on what is going on in the MLP layers. And so we're going back to basics and exploring, can we understand why they're challenging? We have another paper that will be coming out sometime soon where we look at some other toy models that aren't transformers at all, but very toy models.

Play episode from 38:51

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app