How to Train Computer Vision Models to Match Correct Captions

We collect and release a big annotated corpus of New Yorker cartoons. And we in the pixel setting use these annotations like the ones you're describing to train computer vision models. But in the description setting, we sort of invent that process by just handing the models the human author descriptions. GPT for gets around 65% accurate at this 5050 task. We might expect the human performance also to be lower exactly because humor is more subjective.

Play episode from 37:33

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app