AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How to Make Sure Your Training Data Is as Clean as Possible
It's a more difficult problem than you'd imagine. Even for very simple prompts, we're not unajes, like producing squares in the exact same locations. Different types of mitigations actually will run counter to each other. And so if you remove a large fraction of kind of tis content your data set, then you actually have less woman representation an your data set. So it's actually a complicated series of trade offs. K wewere not trying to claim we'v fully solved it. I think there's still a lot of work to be done.