The Real Python Podcast cover image

Natural Language Processing and How ML Models Understand Text

The Real Python Podcast

00:00

Using Binary Vectorization to Train Machine Learning Models

This is called binary vectorization, and it's basically the stupidest, most like naive approach you can take. But it does actually get your results if you've got pretty good separation between your documents. So one thing you can do to clean up is just include your inmost common words in your collection of documents. Ok? And i want to say it's a dumb approach, but sometimes dum is fine, tru.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app