Speechmatics CTO - Next-Generation Speech Recognition

16 snips

Oct 23, 2024

Will Williams, CTO of Speechmatics, shares breakthroughs in speech recognition. He describes a hybrid approach that uses unsupervised learning, requiring 100x less data than traditional methods. The conversation dives into latency-accuracy trade-offs and the complexities of real-time automatic speech recognition, highlighting speaker identification and source separation challenges. Williams also critiques the evolution of deep learning frameworks, emphasizing the critical role of diverse data in training robust systems as Speechmatics navigates innovative growth and ethical considerations in AI.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Whisper's Weakness

Whisper's hallucinations stem from supervised training on noisy data and end-to-end architecture.
This architecture offers less control over outputs, unlike Speechmatics' approach.

ADVICE

Unsupervised Learning

Prioritize unsupervised learning to achieve better generalization and sample efficiency.
Train a small supervised system on a larger unsupervised model for optimal results.

INSIGHT

Intelligence as Compression

Intelligence involves domain-specific compression by creating abstractions.
This compression should generalize to unknown future tasks, similar to human learning.

Get the Snipd Podcast app to discover more snips from this episode

Get the app