

Neural Synthesis of Binaural Speech From Mono Audio with Alexander Richard - #514
Aug 30, 2021
In this discussion, Alexander Richard, a research scientist at Facebook Reality Labs and ICLR Best Paper Award winner, shares insights into his groundbreaking work on binaural audio synthesis. He dives into the challenges of audio representation in noisy environments and the complex process of generating realistic spatial audio from mono sources. Richard also highlights the difficulties of dynamic time warping and the need for accurate 3D measurements in virtual reality. His thoughts on Codec Avatars and future research directions promise to reshape how we experience sound and presence in virtual spaces.
AI Snips
Chapters
Transcript
Episode notes
Accidental AI Career
- Alexander Richard stumbled into machine learning while studying in Germany.
- He was fascinated by a speech recognition demo and pursued that field, eventually landing at Facebook Reality Labs.
Facebook Reality Labs Mission
- Facebook Reality Labs emerged from Oculus Research and focuses on social telepresence in AR/VR.
- Their mission is to enable realistic, 3D conversations in virtual reality.
Audio's Power in AR/VR
- Audio provides cues that visual data often lacks, especially in VR/AR settings.
- Audio can help fill in gaps in visual information, like lip movements obscured by beards or headsets.