Machine Learning Guide cover image

MLG 018 Natural Language Processing 1

Machine Learning Guide

00:00

Using Tokens in Machine Learning

So corpora have documents. Documents have tokens, and then we will operate on these tokens in any number of ways. For example, in pre processing our documents for use in our machine learning models, we may want to remove junk garbage. So one pre processing step might be to throw away stop words. Another thing you can do with tokens is reduce them morphologically. Reduce morphological variation. The structure of a word is its morphology. And you can reduce its morphology by removing some stuff. Let's say you want to remove ing or e d, or any of those things,. To reduce all words that have past, present, future, tense, whatever, into just its one base word

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app