Challenges in Developing End-to-End Language Models for Audio

The chapter explores the difficulties in creating comprehensive language models for audio, highlighting the hurdles in tokenizing and translating speech effectively. It also mentions OpenAI's Whisper as a notable step in this realm, discussing advancements in audio modeling and the surprises in model performance.

Play episode from 24:08

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app