Gradient Dissent: Conversations on AI cover image

Neural Network Pruning and Training with Jonathan Frankle at MosaicML

Gradient Dissent: Conversations on AI

00:00

Transformers and Attention: A Simple Architecture

The self-attention was already bouncing around the literature in various ways. The vision transformer is still something that I almost never get asked for at Mosaic. It's an academic curiosity by and large. Convolutional networks are, they've stood the test of time. Recurrent networks will as well. These little inductive biases insights are pretty hard to come by.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app