Practical AI

SOTA machine translation at Unbabel

Dec 13, 2022
José Souza and Ricardo Rei from Unbabel dive into the cutting-edge world of machine translation. They discuss the importance of quality estimation and innovations like COMET, which blends human expertise with automated evaluation. The duo highlights advancements in translation assessment methods, moving beyond traditional metrics like BLEU to more accurate systems. They also explore the future of translation technology and the challenges of integrating quality predictions, making it a fascinating listen for anyone interested in AI's impact on language.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Quality Estimation at Unbabel

  • Unbabel's quality estimation goes beyond simple confidence scores.
  • It uses a separate metric, run on model output, providing a more nuanced evaluation.
INSIGHT

Comet's Advantages

  • Comet uses large language model embeddings, like XLM-Ruberta, to compare words semantically.
  • Unlike other metrics, it fine-tunes these embeddings for machine translation evaluation, improving accuracy.
ANECDOTE

Embedding Challenges

  • Ricardo Rei highlighted a challenge where similar embeddings for different words can cause issues.
  • An example is mistranslating "50 cents" to "500 cents," which has similar embeddings but a significantly different meaning.
Get the Snipd Podcast app to discover more snips from this episode
Get the app