The Evolution of Comet Metrics

From the very first edition of that metric shared task, results showed that blue was actually never the strongest metric. So then when Metposts released soccer blur as a tool, at least that helped the field forward because that made these scores. Then I want to say two, three years back, everybody jumped over to looking at embedding based metrics and we had a paper called To Ship or Not To Ship where we summarized our internal choice of metrics. We typically prefer the comets scores followed by comet QE scores without a reference followed by character F followed by blue scores, which we mostly have internally.

Play episode from 26:02

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app