AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Can SSMs Do More Than Transformers?
SSM has a property they can compute it in almost linear time. So just from the kind of a computational standpoint, you can just use much longer sequence. And that's one example of something that SSMs have shown a lot more power over than transformers. Jason: Can you give me an example of a long range task that in SSM would do much better than a transformer?