Frontmatter cover image

Lee Baker, Author of Getting Started With Statistics: A Series of Bitesize Guides For Beginners

Frontmatter

00:00

How to Clean Dirty Data

How many different ways are there of spelling the word positive? You can, you can have it spelled all lowercase, all uppercase. And if you leave the e off the end, or you add a full stop, how many different ways is there of misspelling the word negative? There are an infinite number of ways to write down the word 'positive' So somebody like me has got to go in and clean up this dirty data. The best way of doing it is actually to use various artificial intelligence means to be able to do it. One possible way is using something called fuzzy matching.

Play episode from 39:34
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app