Scaling Up N-Gram Language Models

N-grams are theoretically insufficient in transformers where we can't say that's the case. So it'd be tempting to just keep scaling these up but then at some point someone had to think we might actually need a new approach. Yeah, but it sounds like a mix in the sense that it was somewhat unexpected but you had some sense that there was something wrong at least with the phrase based paradigm.

Play episode from 37:17

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app