The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Delivering Neural Speech Services at Scale with Li Jiang - #522

Sep 27, 2021

Li Jiang, a distinguished engineer at Microsoft with 27 years of experience in speech technologies, dives into the rapid advancements in speech recognition. He discusses the trade-offs between hybrid and end-to-end models and their implications for accuracy and service quality. Jiang also highlights the importance of customizing voice solutions for different industries and emphasizes the ethical considerations surrounding text-to-speech technologies. With a forward-looking perspective, he envisions the future of speech services, focusing on achieving human-like communication.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Early Speech Recognition Work

Li Jiang's interest in speech recognition started in college while building a system on an Apple II.
He interned at Microsoft Research in 1994 and has worked on speech technology for 27 years.

INSIGHT

HMMs in Speech Recognition

Statistical approaches, particularly Hidden Markov Models (HMMs), revolutionized speech recognition.
HMMs enabled large vocabulary, speaker-independent, and continuous speech recognition.

INSIGHT

End-to-End Models

End-to-end models are compact and suitable for devices, being 100 times smaller than traditional architectures.
They jointly model acoustic and language aspects in a single model.

Get the Snipd Podcast app to discover more snips from this episode

Get the app