Is There Any Way to Mimulate Mispecification From Training Data?

There's a lot of work trying to answer that question. And I would say there's some good examples. So if you want to reduce the risk of mispecification from training data, one thing you could do is divorce the data that the language model uses to draw on from the actual language model. You just change the corpus that the retriever model kind of draws upon. It's also, I do think it brings us back to this point we discussed earlier about foresight: what could possibly go wrong and what do we actually want from this training data? What does good speech look like?"

Play episode from 43:23

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app