The Effects of Different Domains on Model Performance

The models do get stronger and stronger but the examples that we find they get harder and harder. The final round of the adversarial NLI I think state of the art performance is still something like 40 which really is very very difficult for any existing model. What is human performance on the hard examples of those later on? We don't know if we actually looked at it so there's an interesting paper where we look at the different reasoning types. Maybe an even more interesting experiment would be to have humans interloop on both ends right so rather than having humans full models maybe humans can full other humans and then we'll see what models can do with that data.

Play episode from 14:38

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app