The Multi-Layer Perceptron for Language Learning

Multi-layer perceptron sounds like a neural, deep nut, but. That's the fundamental unit. And before transformers, there were these very complicated LSTM architecture with gates and all of these like confusing bits and bobs that just made it work. With the transformer, all of that was torn away. And the layer became MLPs plus one attention. It turns out to be extraordinarily powerful. The architecture is not this hyper complex beast. It's actually just a very simple, scalable compute saturating.

Transcript

Play full episode

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app