Giancarlo Kerg (Google Scholar) is a PhD student at Mila, supervised by Yoshua Bengio and Guillaume Lajoie. He is working on out-of-distribution generalization and modularity in memory-augmented neural networks.
Highlights from our conversation:
🧮 Pure math foundations as an approach to progress and structural understanding in deep learning research
🧠 How a formal proof on the way self-attention mitigates gradient vanishing when capturing long-term dependencies in RNNs led to a relevancy screening mechanism resembling human memory consolidation
🎯 Out-of-distribution generalization through modularity and inductive biases