SlatorPod cover image

#161 Microsoft’s Christian Federmann on the Translation Quality of Large Language Models

SlatorPod

00:00

The Circularity of Translation Quality Assessment

WMT's metrics set task is that metrics, different various metrics people impose for that chair task are compared against human performance. And the higher the correlation, the better a metric scores and the best performing metric is deemed to be most similar to what humans would have decided. This in no way, shape or form denotes any supremacy. It certainly doesn't mean these things are better than human annotators because that's possibly a pipe dream and hard to achieve. But it was interesting enough from our perspective to showcase that these models can perform super, super well.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app