Machine Learning Street Talk (MLST)

Speechmatics CTO - Next-Generation Speech Recognition

16 snips
Oct 23, 2024
Will Williams, CTO of Speechmatics, shares breakthroughs in speech recognition. He describes a hybrid approach that uses unsupervised learning, requiring 100x less data than traditional methods. The conversation dives into latency-accuracy trade-offs and the complexities of real-time automatic speech recognition, highlighting speaker identification and source separation challenges. Williams also critiques the evolution of deep learning frameworks, emphasizing the critical role of diverse data in training robust systems as Speechmatics navigates innovative growth and ethical considerations in AI.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Whisper's Weakness

  • Whisper's hallucinations stem from supervised training on noisy data and end-to-end architecture.
  • This architecture offers less control over outputs, unlike Speechmatics' approach.
ADVICE

Unsupervised Learning

  • Prioritize unsupervised learning to achieve better generalization and sample efficiency.
  • Train a small supervised system on a larger unsupervised model for optimal results.
INSIGHT

Intelligence as Compression

  • Intelligence involves domain-specific compression by creating abstractions.
  • This compression should generalize to unknown future tasks, similar to human learning.
Get the Snipd Podcast app to discover more snips from this episode
Get the app