NLP Highlights cover image

128 - Dynamic Benchmarking, with Douwe Kiela

NLP Highlights

00:00

Dainabord: A Model Evaluation Platform

Dainabord is a model evaluation as a service platform. It allows people to upload their models and then those models are on the leaderboard which means we can always take the top model on the leaderboards for example. As new rounds come out you lose track of where we are so one very elegant way to fix that is to allow people to talk to these models in DANCH. Do you have any ideas on how you can estimate whether a given question is something that a real user would ask? Yeah I don't know right I mean there's a lot of interesting analysis to be done there even by like looking at parse tree depth of a hypothesis in ANLI.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app