The Future of Machine Learning

Transformers can't learn how to encode and decode its own memory directly in the same sense as an RNN. The more incremental a sequence is, the less the model actually has to compute at each step. Transformers are not special. They dominate large language models. GPT-1, GPT-2 and GPT-3 are effectively the same architecture, just scaled up. There may not need to be any other breakthroughs.

Play episode from 10:04

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app