Practical AI

Democratizing ML for speech

Jan 19, 2022
David Kanter, Executive Director at MLCommons, emphasizes the need for evolving speech datasets to advance machine learning. He discusses new initiatives aimed at democratizing access to speech data through increased diversity in languages and speakers. The conversation highlights the essential balance between openness and proprietary innovation in machine learning, as well as the importance of community involvement in creating and maintaining high-quality datasets. Kanter also outlines future innovations and competitions focusing on enhancing data for better machine learning outcomes.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

MLCommons Mission

  • MLCommons aims to improve machine learning for everyone by stimulating innovation.
  • Their focus areas include benchmarks, data sets, and tools to benefit society.
INSIGHT

Data Sets as Raw Ingredients

  • Data sets are crucial for machine learning advancements, similar to raw materials in other industries.
  • Public data sets allow researchers, even at large companies, to share techniques and advance the field.
ANECDOTE

Criteo Data Set

  • Criteo, a European company, opened an older data set for MLCommons benchmarks.
  • They benefited indirectly as improved systems using their data set also improved their own processes.
Get the Snipd Podcast app to discover more snips from this episode
Get the app