Is the Canon Model a Better Language Model?

The model is far more powerful than just doing the lexical local pattern matching. It's learning generalizable representations, which is why it can improve performance by such huge margins. And yeah, you also mentioned interpolating the two distributions, right? And in general, if you didn't have interpolation, if you just had the nearest neighbor language model alone, do you think, I mean, how well do you think it would work as a language model? So we actually tried this early on in the project and what we found is because of the cache misses the perplexity, which is jump to infinity.

Play episode from 10:21

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app