

Delivering Neural Speech Services at Scale with Li Jiang - #522
Sep 27, 2021
Li Jiang, a distinguished engineer at Microsoft with 27 years of experience in speech technologies, dives into the rapid advancements in speech recognition. He discusses the trade-offs between hybrid and end-to-end models and their implications for accuracy and service quality. Jiang also highlights the importance of customizing voice solutions for different industries and emphasizes the ethical considerations surrounding text-to-speech technologies. With a forward-looking perspective, he envisions the future of speech services, focusing on achieving human-like communication.
AI Snips
Chapters
Transcript
Episode notes
Early Speech Recognition Work
- Li Jiang's interest in speech recognition started in college while building a system on an Apple II.
- He interned at Microsoft Research in 1994 and has worked on speech technology for 27 years.
HMMs in Speech Recognition
- Statistical approaches, particularly Hidden Markov Models (HMMs), revolutionized speech recognition.
- HMMs enabled large vocabulary, speaker-independent, and continuous speech recognition.
End-to-End Models
- End-to-end models are compact and suitable for devices, being 100 times smaller than traditional architectures.
- They jointly model acoustic and language aspects in a single model.