NLP Highlights cover image

128 - Dynamic Benchmarking, with Douwe Kiela

NLP Highlights

CHAPTER

The Effects of Different Domains on Model Performance

The models do get stronger and stronger but the examples that we find they get harder and harder. The final round of the adversarial NLI I think state of the art performance is still something like 40 which really is very very difficult for any existing model. What is human performance on the hard examples of those later on? We don't know if we actually looked at it so there's an interesting paper where we look at the different reasoning types. Maybe an even more interesting experiment would be to have humans interloop on both ends right so rather than having humans full models maybe humans can full other humans and then we'll see what models can do with that data.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner