The Best Way to Train a Vision Plus Language and Coder?

The vising coder does not predict the vision path. Thet take the visein images as t adjiting tokens to the language instead. And iis just pretix ther crosmodalt matching. Ten nonteu reves and is qestions. If you had infinite compute and resources to get any already existing data that you could, what's your intuition for, like, the best way to train a vision plus language and coder does it? Does the question make sense?

Play episode from 32:22

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app