The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Delivering Neural Speech Services at Scale with Li Jiang - #522

Sep 27, 2021
Li Jiang, a distinguished engineer at Microsoft with 27 years of experience in speech technologies, dives into the rapid advancements in speech recognition. He discusses the trade-offs between hybrid and end-to-end models and their implications for accuracy and service quality. Jiang also highlights the importance of customizing voice solutions for different industries and emphasizes the ethical considerations surrounding text-to-speech technologies. With a forward-looking perspective, he envisions the future of speech services, focusing on achieving human-like communication.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Early Speech Recognition Work

  • Li Jiang's interest in speech recognition started in college while building a system on an Apple II.
  • He interned at Microsoft Research in 1994 and has worked on speech technology for 27 years.
INSIGHT

HMMs in Speech Recognition

  • Statistical approaches, particularly Hidden Markov Models (HMMs), revolutionized speech recognition.
  • HMMs enabled large vocabulary, speaker-independent, and continuous speech recognition.
INSIGHT

End-to-End Models

  • End-to-end models are compact and suitable for devices, being 100 times smaller than traditional architectures.
  • They jointly model acoustic and language aspects in a single model.
Get the Snipd Podcast app to discover more snips from this episode
Get the app