
[22] Graham Neubig - Unsupervised Learning of Lexical Information
The Thesis Review
The Parametric Versus Non-Parametric Difference in Machine Learning
One of the things my thesis learned was kind of a quote unquote vocabulary. So if you're familiar with BPE for example, we have the BPE vocabulary. And when you're learning BPE or a sentence piece or something like that, you specify the vocabulary ahead of time. You say I want 16,000 or I want 32,000 tokens. What the non-parametric model allows you to do in this case is it will learn as many tokens in the vocabulary as it needs to fit the model,. But the prior pulls back to reduce the size of the vocabulary.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.