AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Bottleneck of the State Space Model
So for our work, it was actually a quality problem. So if you take kind of a standard state space model and just stick it into language, it doesn't perform super well. But even before we get to the state space solution, the general issue of quadratic relationship between the secret length and compute and memory does that become a limitation at train time, primarily or inference time or something else? It's actually both. If you're training a model that scales quadratic in billion sequence length in order to get it to learn the long term dependencies that you'd want from long sequence, you have to train it with very long sequences. And then that's a quadratic blow