Feature Processing for Text Analytics
Linear Digressions
00:00
Getting Rid of Stop Words in Text Processing
S if you want to represent your entire data set, what you're going to have is all of these thousand dimensional vectors times ever many documentstat you have in your corpus. So it'll be a thousand by 20 or by 50, or by a hundred,. or by a million, however many documents you have. You're going to just make a matrix where you stack all those vectors on top of each other and stick them into your machine learning algorithm. And the zerois back to nothing. S If you want to go the other way, you can't necessarily reconstruct the entire document, but you at the very least reconstruct of all the possible words, which ones are in which documents - that
Play episode from 05:01
Transcript


