
Machine Learning Street Talk (MLST) Facebook Research - Unsupervised Translation of Programming Languages
Jun 24, 2020
Marie-Anne Lachaux, Baptiste Roziere, and Guillaume Lample are talented researchers at Facebook AI Research in Paris, specializing in the unsupervised translation of programming languages. They discuss their groundbreaking method that leverages shared embeddings and tokenization to improve programming language interoperability. The conversation highlights the balance between human insight and machine learning in coding, the challenges of structural differences in languages, and the collaborative culture that fuels innovation at FAIR.
AI Snips
Chapters
Transcript
Episode notes
Unsupervised Translation
- Unsupervised machine translation models learn a shared embedding space for different languages.
- Similar concepts are mapped to similar locations in this space, regardless of the language.
Shared Vocabulary
- Shared vocabularies and word piece tokenization help align different languages in unsupervised translation.
- Special language tokens guide the decoder to generate the correct target language.
Unsupervised Translator
- The researchers trained an unsupervised translator for programming languages like Java, Python, and C++.
- Previous methods were mostly rule-based, requiring extensive expertise and lacking generalizability.



