4min chapter

Towards Data Science cover image

121. Alexei Baevski - data2vec and the future of multimodal learning

Towards Data Science

CHAPTER

Learning the Structure of Speech Through Language Models

We spent several years developing the spatto back line of work. We tried to build relatively light weight models to try to learn the structure of speech data and see if we can improve in te models that transcribe speech into text. And eventually we built a nobicu bafte back, which is whih try to mary the text field and the speech field together by in coging speech into discreet units,. The next step was to figure out whether the discreetation of speech is actually necessary to achieve these kind of results. It turned out that this worked pretty well. You can get a model that is very accurate and gives you transcriptions that are useful in the real world.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode