T F, I, D F Vectorizers in Cartesian Space

A t f, i, d f vecterize is usually preferred in natural language processing for counting the presence of words in documents. The amount of any one word or another will actually move the document around. If you were to compare two documents that have a lot of words in common, then theys, they should what we call have semantic similarity. But if one document ends up having ten times the word politics as another document, then the result gets skewed. It shifts the point in space,. Even though these two documents are still talking about the same things, they still have a many words in common.

Play episode from 29:21

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app