Feature Processing for Text Analytics
Linear Digressions
00:00
Using Bigrams and Trigrams in Machine Learning
Bag of words is unaware of anything about order or relationships between the words, except to say that maybe they occur in documents together more often. So one of the other things that you can do when you're representing text data is you can form what are called n grams. And n grams, where n is there can be one one grams, i guess those are unigrams, that's like a single word. Bigrams are pairs of two words that occur together. Trigrams are three words in a row, and so on. Usually you don't go higher than maybe four or five grahams in practice. But the idea there is now, instead of in quoting words,
Play episode from 06:43
Transcript


