

SOTA machine translation at Unbabel
Dec 13, 2022
José Souza and Ricardo Rei from Unbabel dive into the cutting-edge world of machine translation. They discuss the importance of quality estimation and innovations like COMET, which blends human expertise with automated evaluation. The duo highlights advancements in translation assessment methods, moving beyond traditional metrics like BLEU to more accurate systems. They also explore the future of translation technology and the challenges of integrating quality predictions, making it a fascinating listen for anyone interested in AI's impact on language.
AI Snips
Chapters
Transcript
Episode notes
Quality Estimation at Unbabel
- Unbabel's quality estimation goes beyond simple confidence scores.
- It uses a separate metric, run on model output, providing a more nuanced evaluation.
Comet's Advantages
- Comet uses large language model embeddings, like XLM-Ruberta, to compare words semantically.
- Unlike other metrics, it fine-tunes these embeddings for machine translation evaluation, improving accuracy.
Embedding Challenges
- Ricardo Rei highlighted a challenge where similar embeddings for different words can cause issues.
- An example is mistranslating "50 cents" to "500 cents," which has similar embeddings but a significantly different meaning.