

Democratizing ML for speech
Jan 19, 2022
David Kanter, Executive Director at MLCommons, emphasizes the need for evolving speech datasets to advance machine learning. He discusses new initiatives aimed at democratizing access to speech data through increased diversity in languages and speakers. The conversation highlights the essential balance between openness and proprietary innovation in machine learning, as well as the importance of community involvement in creating and maintaining high-quality datasets. Kanter also outlines future innovations and competitions focusing on enhancing data for better machine learning outcomes.
AI Snips
Chapters
Transcript
Episode notes
MLCommons Mission
- MLCommons aims to improve machine learning for everyone by stimulating innovation.
- Their focus areas include benchmarks, data sets, and tools to benefit society.
Data Sets as Raw Ingredients
- Data sets are crucial for machine learning advancements, similar to raw materials in other industries.
- Public data sets allow researchers, even at large companies, to share techniques and advance the field.
Criteo Data Set
- Criteo, a European company, opened an older data set for MLCommons benchmarks.
- They benefited indirectly as improved systems using their data set also improved their own processes.