AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Hypothesis and Formation of Representations in Transformers
Discussion on the hypothesis that transformers generalize hierarchically based on tree structured representations organized across the layers.
In episode 93 of The Gradient Podcast, Daniel Bashir speaks to Professor Tal Linzen.
Professor Linzen is an Associate Professor of Linguistics and Data Science at New York University and a Research Scientist at Google. He directs the Computation and Psycholinguistics Lab, where he and his collaborators use behavioral experiments and computational methods to study how people learn and understand language. They also develop methods for evaluating, understanding, and improving computational systems for language processing.
Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at editor@thegradient.pub
Subscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter
Outline:
* (00:00) Intro
* (02:25) Prof. Linzen’s background
* (05:37) Back and forth between psycholinguistics and deep learning research, LM evaluation
* (08:40) How can deep learning successes/failures help us understand human language use, methodological concerns, comparing human representations to LM representations
* (14:22) Behavioral capacities and degrees of freedom in representations
* (16:40) How LMs are becoming less and less like humans
* (19:25) Assessing LSTMs’ ability to learn syntax-sensitive dependencies
* (22:48) Similarities between structure-sensitive dependencies, sophistication of syntactic representations
* (25:30) RNNs implicitly implement tensor-product representations—vector representations of symbolic structures
* (29:45) Representations required to solve certain tasks, difficulty of natural language
* (33:25) Accelerating progress towards human-like linguistic generalization
* (34:30) The pre-training agnostic identically distributed evaluation paradigm
* (39:50) Ways to mitigate differences in evaluation
* (44:20) Surprisal does not explain syntactic disambiguation difficulty
* (45:00) How to measure processing difficulty, predictability and processing difficulty
* (49:20) What other factors influence processing difficulty?
* (53:10) How to plant trees in language models
* (55:45) Architectural influences on generalizing knowledge of linguistic structure
* (58:20) “Cognitively relevant regimes” and speed of generalization
* (1:00:45) Acquisition of syntax and sampling simpler vs. more complex sentences
* (1:04:03) Curriculum learning for progressively more complicated syntax
* (1:05:35) Hypothesizing tree-structured representations
* (1:08:00) Reflecting on a prediction from the past
* (1:10:15) Goals and “the correct direction” in AI research
* (1:14:04) Outro
Links:
* Prof. Linzen’s Twitter and homepage
* Papers
* Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies
* RNNS Implicitly Implement Tensor-Product Representations
* How Can We Accelerate Progress Towards Human-like Linguistic Generalization?
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode