

Speech tech and Common Voice at Mozilla
Sep 9, 2020
Join Jenny Zhang from Mozilla, focused on the Common Voice project, Remy Muhire, passionate about VoiceTech, and Josh Meyer, who champions African language tech. They explore the biases in speech data affecting language and accent recognition. Discover Mozilla’s inclusive approach to creating an open-source voice database. The trio also discusses challenges in gathering diverse datasets for marginalized communities, particularly in Sub-Saharan Africa, and emphasizes the need for ethical data practices to support underrepresented languages.
AI Snips
Chapters
Books
Transcript
Episode notes
Common Voice's Origin
- In 2017, open-source speech data was scarce, English-centric, and lacked diversity.
- Mozilla Common Voice aimed to democratize speech tech by crowdsourcing diverse voice data.
Data Needs for Speech Recognition
- The amount of speech data needed for speech recognition depends on the application's complexity.
- While simple tasks may require minimal data, robust models need around 2,000 hours of transcribed speech.
Community Validation
- Common Voice uses community validation where volunteers determine if audio matches text.
- This unorthodox approach prioritizes community involvement and diverse noise environments.