Machine Learning Street Talk (MLST)

Facebook Research - Unsupervised Translation of Programming Languages

Jun 24, 2020
Marie-Anne Lachaux, Baptiste Roziere, and Guillaume Lample are talented researchers at Facebook AI Research in Paris, specializing in the unsupervised translation of programming languages. They discuss their groundbreaking method that leverages shared embeddings and tokenization to improve programming language interoperability. The conversation highlights the balance between human insight and machine learning in coding, the challenges of structural differences in languages, and the collaborative culture that fuels innovation at FAIR.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Unsupervised Translation

  • Unsupervised machine translation models learn a shared embedding space for different languages.
  • Similar concepts are mapped to similar locations in this space, regardless of the language.
INSIGHT

Shared Vocabulary

  • Shared vocabularies and word piece tokenization help align different languages in unsupervised translation.
  • Special language tokens guide the decoder to generate the correct target language.
ANECDOTE

Unsupervised Translator

  • The researchers trained an unsupervised translator for programming languages like Java, Python, and C++.
  • Previous methods were mostly rule-based, requiring extensive expertise and lacking generalizability.
Get the Snipd Podcast app to discover more snips from this episode
Get the app