How to Make Sure Your Training Data Is as Clean as Possible

It's a more difficult problem than you'd imagine. Even for very simple prompts, we're not unajes, like producing squares in the exact same locations. Different types of mitigations actually will run counter to each other. And so if you remove a large fraction of kind of tis content your data set, then you actually have less woman representation an your data set. So it's actually a complicated series of trade offs. K wewere not trying to claim we'v fully solved it. I think there's still a lot of work to be done.

Play episode from 24:18

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app