Machine Learning Guide cover image

MLG 032 Cartesian Similarity Metrics

Machine Learning Guide

00:00

T F, I, D F Vectorizers in Cartesian Space

A t f, i, d f vecterize is usually preferred in natural language processing for counting the presence of words in documents. The amount of any one word or another will actually move the document around. If you were to compare two documents that have a lot of words in common, then theys, they should what we call have semantic similarity. But if one document ends up having ten times the word politics as another document, then the result gets skewed. It shifts the point in space,. Even though these two documents are still talking about the same things, they still have a many words in common.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app