AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Importance of Hardware Architecture for Attention-Specific Models
The focus is like speeding up matrix multiplications, right? The faster we get those, that's kind of all we need. But now that attention is all we need, are there specific structures that you see evolving in hardware to accommodate attention? Are there registers that you might build into hardware that are attention specific or kind of designed to accommodate unique properties of the transformer? Yeah, great question. When you look at these Prometheus style models, which are coming up from OpenAI team, they're like, what, 175 billion parameter kind of issues and GPT 4 is releasing this week or next week. I'm pretty sure it's north of 200 billion parameters. So the question really becomes,