
Lee Baker, Author of Getting Started With Statistics: A Series of Bitesize Guides For Beginners
Frontmatter
00:00
How to Clean Dirty Data
How many different ways are there of spelling the word positive? You can, you can have it spelled all lowercase, all uppercase. And if you leave the e off the end, or you add a full stop, how many different ways is there of misspelling the word negative? There are an infinite number of ways to write down the word 'positive' So somebody like me has got to go in and clean up this dirty data. The best way of doing it is actually to use various artificial intelligence means to be able to do it. One possible way is using something called fuzzy matching.
Play episode from 39:34
Transcript


