Feature Processing for Text Analytics
Linear Digressions
00:00
What Do You Know About Bag of Words?
Even if you remove the stop words, there'll still be probably pretty common words that show up a lot. There's a particular type of vectorization that you can do if you want to have a little bit more emphasis on words that are infrequent. So then instead of just getting sort of this binary or simple counting measure for each word, what you get instead is a weight for each word. Thate's a combination of how often does this word occur in this document and how rare is this word in the entire corpus overall.
Play episode from 08:53
Transcript


