Democratizing ML for speech

Jan 19, 2022

David Kanter, Executive Director at MLCommons, emphasizes the need for evolving speech datasets to advance machine learning. He discusses new initiatives aimed at democratizing access to speech data through increased diversity in languages and speakers. The conversation highlights the essential balance between openness and proprietary innovation in machine learning, as well as the importance of community involvement in creating and maintaining high-quality datasets. Kanter also outlines future innovations and competitions focusing on enhancing data for better machine learning outcomes.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

MLCommons Mission

MLCommons aims to improve machine learning for everyone by stimulating innovation.
Their focus areas include benchmarks, data sets, and tools to benefit society.

INSIGHT

Data Sets as Raw Ingredients

Data sets are crucial for machine learning advancements, similar to raw materials in other industries.
Public data sets allow researchers, even at large companies, to share techniques and advance the field.

ANECDOTE

Criteo Data Set

Criteo, a European company, opened an older data set for MLCommons benchmarks.
They benefited indirectly as improved systems using their data set also improved their own processes.

Get the Snipd Podcast app to discover more snips from this episode

Get the app