Towards Data Science cover image

121. Alexei Baevski - data2vec and the future of multimodal learning

Towards Data Science

00:00

Learning the Structure of Speech Through Language Models

We spent several years developing the spatto back line of work. We tried to build relatively light weight models to try to learn the structure of speech data and see if we can improve in te models that transcribe speech into text. And eventually we built a nobicu bafte back, which is whih try to mary the text field and the speech field together by in coging speech into discreet units,. The next step was to figure out whether the discreetation of speech is actually necessary to achieve these kind of results. It turned out that this worked pretty well. You can get a model that is very accurate and gives you transcriptions that are useful in the real world.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app