
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) The Unreasonable Effectiveness of the Forget Gate with Jos Van Der Westhuizen - TWiML Talk #240
Mar 18, 2019
Jos Van Der Westhuizen, a PhD student at Cambridge University, discusses his immersive journey from biomedical engineering to machine learning. He dives into the importance of the forget gate in LSTMs, revealing how it boosts computational efficiency. The conversation also covers his innovative architecture, Janet, which combines attention mechanisms with LSTMs. Jos emphasizes selective learning and how managing what to forget is key in optimizing neural networks. Tune in to hear about the future of simpler, more efficient neural network designs!
AI Snips
Chapters
Transcript
Episode notes
Accidental ML Journey
- Jos van der Westhuizen's path to machine learning was accidental, starting in biomedical engineering and computational neuroscience.
- He initially aimed to create a wristwatch for comprehensive health diagnostics, but pivoted towards machine learning after encountering temporal modeling techniques.
LSTM Gates and Gradients
- Recurrent neural networks (RNNs) face gradient problems due to conflicting updates through the same edges during backpropagation.
- LSTMs use input, output, and forget gates to mitigate these issues, enabling more effective memory management.
LSTM Gate Functions
- The typical LSTM utilizes three gates: input, output, and forget gates.
- These gates control information flow: input at each timestep, output to the next cell, and how much information to forget.

