SOTA machine translation at Unbabel

Dec 13, 2022

Guest

José Souza

Guest

Ricardo Rei

José Souza and Ricardo Rei from Unbabel dive into the cutting-edge world of machine translation. They discuss the importance of quality estimation and innovations like COMET, which blends human expertise with automated evaluation. The duo highlights advancements in translation assessment methods, moving beyond traditional metrics like BLEU to more accurate systems. They also explore the future of translation technology and the challenges of integrating quality predictions, making it a fascinating listen for anyone interested in AI's impact on language.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Quality Estimation at Unbabel

Unbabel's quality estimation goes beyond simple confidence scores.
It uses a separate metric, run on model output, providing a more nuanced evaluation.

INSIGHT

Comet's Advantages

Comet uses large language model embeddings, like XLM-Ruberta, to compare words semantically.
Unlike other metrics, it fine-tunes these embeddings for machine translation evaluation, improving accuracy.

ANECDOTE

Embedding Challenges

Ricardo Rei highlighted a challenge where similar embeddings for different words can cause issues.
An example is mistranslating "50 cents" to "500 cents," which has similar embeddings but a significantly different meaning.

Get the Snipd Podcast app to discover more snips from this episode

Get the app