AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How to Clean and Verify a Data Set
We have a quality audit process that sends out samples of places to human moderators. Then we have a lot of machine learning models that go behind the labeled training data and input sources which can give us some confidence scores on whether or not a place from a source is real. There are bad sources, there are sources that we use that turn out to be bad sources and we need to know how to down weight or discard that data before it makes it into the final product. We just have to take that and develop methodologies to make those small amounts of labels valid across the 205 million places in our data set.