The Cost of Initial Learning for NLP

The incumbent techniques these days are sequence to sequence models and their variance trained on maximum likelihood loss. And so now it's really a question of, is the test time behavior of this model being substantially hurt by the fact that it gets into areas of the search space or it basically makes errors that it doesn't know how to recover from? So if you have a part of speech tagger that's getting 95% accuracy, is this worth doing? Probably not. You should just stick with your maximum likelihood trained thing A because there's not much headroom,. If your thing is 95% accurate, 95% of the time it's making the same predictions as the expert anyway.

Play episode from 26:54

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app