The Gradient: Perspectives on AI cover image

Catherine Olsson and Nelson Elhage: Anthropic, Understanding Transformers

The Gradient: Perspectives on AI

CHAPTER

Getting Traction in the MLP Layers

The researchers have had a lot of success understanding the attention mechanism in Transformers. They are focusing their next thread of research on what is going on in the MLP layers. And so we're going back to basics and exploring, can we understand why they're challenging? We have another paper that will be coming out sometime soon where we look at some other toy models that aren't transformers at all, but very toy models.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner