Manifold cover image

Artificial Intelligence & Large Language Models: Oxford Lecture — #35

Manifold

CHAPTER

Attention Is All You Need

The idea was that this nonlinear property I mentioned, which is that word order matters and matters differently in different languages, you need an architecture that can take that into account. And so the real innovation in this paper is to introduce these three big matrices, which are 10,000 by 10,000 dimensional mat Matrix: Q, K and V. So when you're done with your training, they're fixed. But interestingly, like more training further refines them. If you come across more data or you can run longer, you get more refined versions of these matices.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner