

Learning Long-Time Dependencies with RNNs w/ Konstantin Rusch - #484
May 17, 2021
In this discussion, Konstantin Rusch, a PhD student at ETH Zurich, dives into innovative recurrent neural networks (RNNs) aimed at tackling long-time dependencies. He shares insights from his papers on coRNN and uniCORNN, inspired by neuroscience, and how these architectures compare to traditional models like LSTMs. Konstantin also reveals challenges in ensuring gradient stability and innovative techniques that enhance RNNs' expressive power. Plus, he discusses his ambitions for future advancements in memory efficiency and performance.
AI Snips
Chapters
Transcript
Episode notes
Vanishing/Exploding Gradient Problem
- RNNs struggle to learn long-time dependencies due to the vanishing/exploding gradient problem during backpropagation.
- This arises from long products in the gradient calculation, which shrink or grow exponentially with sequence length.
Neuroscience-Inspired RNNs
- Konstantin's approach to RNNs is inspired by neuroscience, specifically the oscillatory behavior of neurons.
- Oscillatory behavior offers stability, both in the system's state and its gradient, potentially enabling long-time dependency learning.
coRNN Performance on Adding Problem
- Konstantin benchmarked coRNN on the adding problem, a synthetic long-time dependency task.
- coRNN achieved direct convergence even with sequences of length 5,000, outperforming LSTMs which failed at shorter lengths.