AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Attention Is All You Need
The transformer is a magnificent neural network architecture, because it is a general purpose differentiable computer. It's simultaneously expressive in the forward pass, optimizable via back propagation gradient descent, and efficient high parallelism compute graph. You want to have a general purpose computer that you can train on arbitrary problems like next word prediction or detecting if there's a cat in an image. If it was too grand, it would over promise and then under deliver potentially. That should be a t-shirt.