SlatorPod cover image

#161 Microsoft’s Christian Federmann on the Translation Quality of Large Language Models

SlatorPod

00:00

The Evolution of Comet Metrics

From the very first edition of that metric shared task, results showed that blue was actually never the strongest metric. So then when Metposts released soccer blur as a tool, at least that helped the field forward because that made these scores. Then I want to say two, three years back, everybody jumped over to looking at embedding based metrics and we had a paper called To Ship or Not To Ship where we summarized our internal choice of metrics. We typically prefer the comets scores followed by comet QE scores without a reference followed by character F followed by blue scores, which we mostly have internally.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app