AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Unification of Text, Image, Vision, and Layout
The new UDAB model unifies text, image, vision and layout. So it's not just about text and images, but it's about text images and layout. And the best part is there's a whole benchmark on this where you have a set of questions that you need to answer for these documents. The way it basically does this is it just has these joint pre-training objectives.