
Artificial Intelligence & Large Language Models: Oxford Lecture — #35
Manifold
Attention Is All You Need
The idea was that this nonlinear property I mentioned, which is that word order matters and matters differently in different languages, you need an architecture that can take that into account. And so the real innovation in this paper is to introduce these three big matrices, which are 10,000 by 10,000 dimensional mat Matrix: Q, K and V. So when you're done with your training, they're fixed. But interestingly, like more training further refines them. If you come across more data or you can run longer, you get more refined versions of these matices.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.