

Single Headed Attention RNN: Stop Thinking With Your Head with Stephen Merity - #325
Dec 12, 2019
Stephen Merity, an NLP and deep learning researcher at DDX Times, shares insights into his innovative work on Single Headed Attention RNNs. He delves into his motivations for developing this model and contrasts it with conventional transformers. Merity emphasizes the importance of efficient model benchmarking, revealing how he made training accessible on a single GPU. He also discusses the significance of diversifying AI research, encouraging exploration beyond just large models. Plus, he reflects on the balance between academic writing and accessibility.
AI Snips
Chapters
Transcript
Episode notes
Language Model Core Function
- Language models, at their core, predict the next token (character, word, or subword) in a sequence.
- Scaling up these models reveals surprising knowledge about language and its structure, even without explicit training data.
Sentiment Neuron Example
- OpenAI's Sentiment Neuron, trained solely on predicting the next character in Amazon reviews, accurately identified review sentiment.
- It could even generate positive or negative reviews, demonstrating a deeper grasp of language.
Challenging Transformer Dominance
- Stephen Merity's research aims to challenge the dominance of transformer architectures in language modeling.
- He uses LSTMs to achieve similar results with fewer resources, increasing accessibility for researchers.