The Parametric Versus Non-Parametric Difference in Machine Learning

One of the things my thesis learned was kind of a quote unquote vocabulary. So if you're familiar with BPE for example, we have the BPE vocabulary. And when you're learning BPE or a sentence piece or something like that, you specify the vocabulary ahead of time. You say I want 16,000 or I want 32,000 tokens. What the non-parametric model allows you to do in this case is it will learn as many tokens in the vocabulary as it needs to fit the model,. But the prior pulls back to reduce the size of the vocabulary.

Play episode at 14:42

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app