
123 - Robust NLP, with Robin Jia
NLP Highlights
00:00
The Importance of Label Unbalance in Active Learning
Yes so in general when we talk about label imbalance like I think the simplest thing to think about is just in binary classification. We at least want to train on somewhat balanced data sets that's just like kind of makes model training work better. But the actual distribution you care about might be extremely imbalanced and that like one label might be much more common than the other. In QA this does also arise whenever you think about things related to open domain question answering where there's tons of documents on the web very few of them perhaps none of them actually answer a question that the user has asked. So if you're trying to use a system to detect whether some new piece of writing is a duplicate
Transcript
Play full episode