AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Learning Through Multiple Modalities
The main finding from this work for us was that you can actually learn both video and audio. And so it's just a simple application of contrast learning, basically the video and corresponding audio are positives or negatives. It comes out of a human's mouth as a source of audio because these things are correlated. The typical use case is basically like someone playing a particular kind of musical instrument on Guitar World. So where you can't learn anything about bananas there. Or when humans mention bananas. When they say the word banana, then you can't trust basically anything.