
121. Alexei Baevski - data2vec and the future of multimodal learning
Towards Data Science
Learning the Structure of Speech Through Language Models
We spent several years developing the spatto back line of work. We tried to build relatively light weight models to try to learn the structure of speech data and see if we can improve in te models that transcribe speech into text. And eventually we built a nobicu bafte back, which is whih try to mary the text field and the speech field together by in coging speech into discreet units,. The next step was to figure out whether the discreetation of speech is actually necessary to achieve these kind of results. It turned out that this worked pretty well. You can get a model that is very accurate and gives you transcriptions that are useful in the real world.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.