Practical AI cover image

Technical advances in document understanding

Practical AI

00:00

Vision-language models for documents

Daniel outlines LVMs: image+text inputs, joint embeddings, and token-stream text outputs for reasoning.

Play episode from 34:19
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app