AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The State Space Model for Deep Learning
The inspiration came from this second year, doubly or signal processing context. So the first step is really, okay, we have this primitive that we think should work. And I've had great landmates who have been kind of looking at state space models for years. But we always ran into this problem where if you just took a state space model and swapped out attention in transform. We'll see gaps of five perplexity points. For some context, that's the difference between like a 100 million per amter model and a 10 billion parameter model. It's a big gap in quality and not something that that we'd like to see.