The Importance of Data Labeling in Machine Learning

The number of questions you can formulate, even just in English, is really vast. One question that's open in my mind still is did we get a large enough data set after bootstrapping? Ten thousand questions is a lot of language. As Max mentioned, our refusal classification was pretty decent at about 96%. So I don't know if that is going to destroy the performance. Maybe it would. That's another thing to experiment with.

Play episode from 37:43

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app