AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Is There a Difference in Attention Architectures?
In 2014, when we introduced the modern form of attention that is in transformers these days. We're quite aware that human attention is more like this hard, probabilistic, castic phenomenon where you choose one thing or the other. It's just we didn't have the algorithms to conveniently train a system with stochastic hard attention. And I now kind of thinking we can design much better algorithms for learning to attend in a way that's stochastichard decisions, just like with conscious attention.