
128 - Dynamic Benchmarking, with Douwe Kiela
NLP Highlights
The Effect of Prompts on Sentiments
The hard examples in NLI and the stress test datasets can you talk about that piece yeah. We also evaluated the models on those I think they are very high variance so it's sometimes not easy to say to draw real conclusions about what's going on there. What we found is that if you train on this data you also get better at these phenomena measured by the stress test butYeah as I said I think that should come with the caveat where I don't know what they measure anyway okay are there any other trends I mean you mentioned the audenoscent work and I know that you have a couple of other adversarial liquid datasets as well in general whether any trends interesting trends in any of those
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.