
128 - Dynamic Benchmarking, with Douwe Kiela
NLP Highlights
Dainabord: A Model Evaluation Platform
Dainabord is a model evaluation as a service platform. It allows people to upload their models and then those models are on the leaderboard which means we can always take the top model on the leaderboards for example. As new rounds come out you lose track of where we are so one very elegant way to fix that is to allow people to talk to these models in DANCH. Do you have any ideas on how you can estimate whether a given question is something that a real user would ask? Yeah I don't know right I mean there's a lot of interesting analysis to be done there even by like looking at parse tree depth of a hypothesis in ANLI.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.