Towards Data Science cover image

121. Alexei Baevski - data2vec and the future of multimodal learning

Towards Data Science

CHAPTER

Learning the Structure of Speech Through Language Models

We spent several years developing the spatto back line of work. We tried to build relatively light weight models to try to learn the structure of speech data and see if we can improve in te models that transcribe speech into text. And eventually we built a nobicu bafte back, which is whih try to mary the text field and the speech field together by in coging speech into discreet units,. The next step was to figure out whether the discreetation of speech is actually necessary to achieve these kind of results. It turned out that this worked pretty well. You can get a model that is very accurate and gives you transcriptions that are useful in the real world.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner