Adam Coates, Director of Baidu's Silicon Valley AI Lab, discusses the lab's work in speech-to-text and text-to-speech projects, scaling speech recognition for better performance, the potential of speech recognition as a product, concerns of faking voices, the impact of AI on jobs, and the process of finding and nurturing experts in the AI field.
Read more
AI Summary
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
Baidu achieved superhuman performance in Mandarin speech recognition by scaling up the use of neural networks, investing in computational resources, and collecting massive amounts of data.
Baidu is focused on creating immersive AI products with speech and text-to-speech (TTS) interfaces, aiming to make TTS as good as human-level performance and exploring the potential for a voice-first keyboard.
Deep dives
Adam Coats and Baidu's Focus on AI Technologies
Adam Coats, the director of Baidu's Silicon Valley AI Lab, discusses the company's focus on developing AI technologies that will impact at least 100 million people. Baidu, China's largest search engine, has transformed itself from a PC search leader to a mobile revolution player and is now increasingly becoming an AI company. The Silicon Valley AI Lab, one of Baidu's four research labs, was founded to bridge the gap between rapidly advancing deep learning and AI research and its translation into practical business applications.
Speech Recognition Breakthrough with Deep Speech
Adam Coats talks about Baidu's work on speech recognition and the development of its deep learning-powered speech engine called Deep Speech. The goal was to achieve human-level speech recognition for every product and context, regardless of accent or background noise. By scaling up the use of neural networks, investing in computational resources, and collecting massive amounts of data, Baidu achieved superhuman performance in Mandarin speech recognition. However, the challenge remains to lower the amount of data needed and develop machine learning systems that can achieve human performance with fewer labeled examples.
Text-to-Speech and the Future of AI Products
Adam Coats also discusses Baidu's efforts in text-to-speech (TTS) technology, their aim to make it as good as human-level performance, and the potential for a voice-first keyboard. Baidu's research focuses on creating immersive AI products where speech and TTS are the primary interfaces. By leveraging deep learning and abandoning hand-engineered solutions, Baidu has made significant progress in building better speech systems. However, challenges still exist, such as handling multiple speakers, background noise, and long-form transcription. Future research will strive to address these challenges and make AI technologies even more impactful.