
128 - Dynamic Benchmarking, with Douwe Kiela
NLP Highlights
How to Integrate Utility and Computing Into a Leaderboard
The basic idea is that everyone has a different utility function and so the metrics we have are performance or accuracy based performance metrics throughput or compute memory robustness and fairness. These ultimately determine for you how much you value something right so how much it is worth to you. You can think of this as trade-offs or exchange rates where you can trade off your compute versus your accuracy. There's the default exchange rate that comes out of how model creators behave on the leaderboard and then if you as a user disagree with that default weighting then you can specify the weights yourself.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.