
128 - Dynamic Benchmarking, with Douwe Kiela
NLP Highlights
The Effects of Different Domains on Model Performance
The models do get stronger and stronger but the examples that we find they get harder and harder. The final round of the adversarial NLI I think state of the art performance is still something like 40 which really is very very difficult for any existing model. What is human performance on the hard examples of those later on? We don't know if we actually looked at it so there's an interesting paper where we look at the different reasoning types. Maybe an even more interesting experiment would be to have humans interloop on both ends right so rather than having humans full models maybe humans can full other humans and then we'll see what models can do with that data.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.