
MLG 032 Cartesian Similarity Metrics
Machine Learning Guide
00:00
T F, I, D F Vectorizers in Cartesian Space
A t f, i, d f vecterize is usually preferred in natural language processing for counting the presence of words in documents. The amount of any one word or another will actually move the document around. If you were to compare two documents that have a lot of words in common, then theys, they should what we call have semantic similarity. But if one document ends up having ten times the word politics as another document, then the result gets skewed. It shifts the point in space,. Even though these two documents are still talking about the same things, they still have a many words in common.
Transcript
Play full episode