Transformers and Attention: A Simple Architecture

The self-attention was already bouncing around the literature in various ways. The vision transformer is still something that I almost never get asked for at Mosaic. It's an academic curiosity by and large. Convolutional networks are, they've stood the test of time. Recurrent networks will as well. These little inductive biases insights are pretty hard to come by.

Transcript

Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app