#151 – Dan Kokotov: Speech Recognition with AI and Humans
Jan 4, 2021
auto_awesome
Dan Kokotov, VP of Engineering at Rev.ai, shares his expertise in automatic speech recognition technology. He discusses the challenges of real-time transcription, including accuracy issues with accents and pacing. Kokotov emphasizes the role of user feedback and data quality in improving ASR systems. He also explores the future of transcription services in the gig economy and highlights the importance of bridging human and machine efforts. Their conversation touches on the evolution of podcasting and the need for standardized transcripts to enhance accessibility.
Rev.ai aims to revolutionize the transcription industry by providing accurate and high-quality transcriptions with a seamless user experience.
Transcriptions can enhance the discoverability and engagement of audio content, enabling deeper analysis and facilitating remote collaboration.
Podcasts have the unique ability to create a sense of community and belonging, establishing deep connections between hosts and listeners.
Deep dives
Rev.ai: Transforming Speech to Text
Rev.ai is a platform that offers speech-to-text AI services, including captioning and transcription. The company aims to improve on the traditional freelancer marketplace model and provide a seamless and efficient experience for both customers and freelancers. Rev.ai's ASR (automatic speech recognition) technology delivers accurate and high-quality transcriptions, with the goal of achieving a 3% word error rate. The platform allows users to easily upload audio files, select preferences, and receive rapid, machine-generated transcripts that can be edited and improved by human transcribers. Rev.ai envisions a future where all meetings and conversations are easily accessible through indexed and searchable transcripts, facilitating better search, discovery, and analysis of audio content.
Building a Platform for Empowered Creators
Rev.ai is dedicated to empowering creators and making transcription and captioning widely accessible. By providing an API and tools like the Rev Editor, the platform allows developers and users to harness the power of ASR to build new applications and experiences. The company's goal is to enable users to easily search, reference, and share audio content, making podcasts and other audio formats more discoverable and user-friendly. Rev.ai also recognizes the importance of data and continuously works to improve the accuracy of its ASR models using the vast amount of annotated data generated by its human transcriptionists, as well as advanced machine learning techniques.
Transcriptions: A Game Changer for Podcasts and Meetings
Rev.ai envisions a world where transcriptions become a standard feature for podcasts, meetings, and other audio formats. With accurate and searchable transcripts, podcasts become more discoverable and shareable, enabling deeper analysis and engagement with content. Transcriptions also enhance remote collaboration, allowing teams to easily reference and revisit meeting conversations. The platform aims to simplify the process of annotating audio and video by providing a seamless user experience and powerful tools for transcription and captioning. Rev.ai believes that widespread adoption of transcriptions can revolutionize the way people consume and interact with audio content.
Rev.ai's Commitment to Quality and Innovation
Rev.ai is committed to pushing the boundaries of speech-to-text AI and delivering the best possible tools and services to its customers. With a focus on data, accuracy, and simplicity, the company strives to create a platform that meets the needs of creators, journalists, researchers, and businesses alike. By harnessing the power of ASR technology and leveraging the expertise of human transcriptionists, Rev.ai aims to transform the way audio content is transcribed, indexed, and utilized, paving the way for a more accessible and engaged audio ecosystem.
The Challenges of Platform Censorship
The podcast episode delves into the challenges faced by platforms like YouTube and Twitter when it comes to censorship. The conversation explores the difficulties in filtering out conspiracy theories and determining what content represents dangerous untruths. It also touches upon the balance between free speech and encouraging kindness and respect among users.
The Evolution of Podcasting and its Impact
The episode discusses the power and impact of podcasting as a medium. It highlights the unique ability of podcasts to establish a one-way connection between the host and listeners, allowing for deep and meaningful conversations and connections. The conversation explores how podcasts like Joe Rogan's have created a sense of friendship and camaraderie with listeners, transcending physical boundaries and providing a sense of belonging in a world of limited human interactions.
Dan Kokotov is VP of Engineering at Rev.ai, an automatic speech recognition company. Please support this podcast by checking out our sponsors:
– Athletic Greens: https://athleticgreens.com/lex and use code LEX to get 1 month of fish oil
– Blinkist: https://blinkist.com/lex and use code LEX to get 25% off premium
– Business Wars: https://wondery.com/business-wars/
– Cash App: https://cash.app/ and use code LexPodcast to get $10
OUTLINE:
Here’s the timestamps for the episode. On some podcast players you should be able to click the timestamp to jump to that time.
(00:00) – Introduction
(09:17) – Dune
(12:34) – Rev
(18:33) – Translation
(25:22) – Gig economy
(34:02) – Automatic speech recognition
(44:53) – Create products that people love
(53:02) – The future of podcasts at Spotify
(1:14:41) – Book recommendations
(1:16:02) – Stories of our dystopian future
(1:19:45) – Movies about Stalin and Hitler
(1:24:59) – Interviewing Putin
(1:30:56) – Meaning of life
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.