The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Single Headed Attention RNN: Stop Thinking With Your Head with Stephen Merity - #325

Dec 12, 2019

Stephen Merity, an NLP and deep learning researcher at DDX Times, shares insights into his innovative work on Single Headed Attention RNNs. He delves into his motivations for developing this model and contrasts it with conventional transformers. Merity emphasizes the importance of efficient model benchmarking, revealing how he made training accessible on a single GPU. He also discusses the significance of diversifying AI research, encouraging exploration beyond just large models. Plus, he reflects on the balance between academic writing and accessibility.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Language Model Core Function

Language models, at their core, predict the next token (character, word, or subword) in a sequence.
Scaling up these models reveals surprising knowledge about language and its structure, even without explicit training data.

ANECDOTE

Sentiment Neuron Example

OpenAI's Sentiment Neuron, trained solely on predicting the next character in Amazon reviews, accurately identified review sentiment.
It could even generate positive or negative reviews, demonstrating a deeper grasp of language.

INSIGHT

Challenging Transformer Dominance

Stephen Merity's research aims to challenge the dominance of transformer architectures in language modeling.
He uses LSTMs to achieve similar results with fewer resources, increasing accessibility for researchers.

Get the Snipd Podcast app to discover more snips from this episode

Get the app