AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Future of Vision and Language Encoders
GPT-4 has some pretty interesting elements in terms of zero-shot image understanding capabilities, being able to also do some step-by-step right explanation of how it did that reasoning. But there's still a long way to go for GPT-4 and many other models like Lexmurd which was one of the first multimodal birds. The sad part is that these are still extremely out of sort of reach in terms of the resources available to most people, right? I mean, obviously most of academia, but also just in general, right?"