AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Parametric Versus Non-Parametric Difference in Machine Learning
One of the things my thesis learned was kind of a quote unquote vocabulary. So if you're familiar with BPE for example, we have the BPE vocabulary. And when you're learning BPE or a sentence piece or something like that, you specify the vocabulary ahead of time. You say I want 16,000 or I want 32,000 tokens. What the non-parametric model allows you to do in this case is it will learn as many tokens in the vocabulary as it needs to fit the model,. But the prior pulls back to reduce the size of the vocabulary.