Speech tech and Common Voice at Mozilla

Sep 9, 2020

Guest

Jenny Zhang

Join Jenny Zhang from Mozilla, focused on the Common Voice project, Remy Muhire, passionate about VoiceTech, and Josh Meyer, who champions African language tech. They explore the biases in speech data affecting language and accent recognition. Discover Mozilla’s inclusive approach to creating an open-source voice database. The trio also discusses challenges in gathering diverse datasets for marginalized communities, particularly in Sub-Saharan Africa, and emphasizes the need for ethical data practices to support underrepresented languages.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

INSIGHT

Common Voice's Origin

In 2017, open-source speech data was scarce, English-centric, and lacked diversity.
Mozilla Common Voice aimed to democratize speech tech by crowdsourcing diverse voice data.

INSIGHT

Data Needs for Speech Recognition

The amount of speech data needed for speech recognition depends on the application's complexity.
While simple tasks may require minimal data, robust models need around 2,000 hours of transcribed speech.

ANECDOTE

Community Validation

Common Voice uses community validation where volunteers determine if audio matches text.
This unorthodox approach prioritizes community involvement and diverse noise environments.

Get the Snipd Podcast app to discover more snips from this episode

Get the app