The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Reasoning Over Complex Documents with DocLLM with Armineh Nourbakhsh - #672

24 snips
Feb 19, 2024
Armineh Nourbakhsh, Executive Director at JP Morgan AI Research, dives into the exciting world of DocLLM, a layout-aware large language model designed for document understanding. She shares insights on the evolution of document AI, focusing on multimodal approaches that combine textual and visual data. Nourbakhsh discusses the challenges of training generative models, the intricacies of processing enterprise documents, and strategies to reduce hallucinations in language models, enhancing performance in complex document analysis.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Document AI Challenge at S&P Global

  • Armineh Nourbakhsh's first multimodal document AI challenge involved automating analysis of client documents at S&P Global.
  • Credit rating analysts reviewed hundreds of pages, prompting AI automation.
INSIGHT

Document AI: An Unsolved Problem

  • Despite advancements, document AI remains an unsolved problem, especially in enterprise settings.
  • Encoder-only architectures dominate, requiring frequent fine-tuning for new tasks and data distributions.
INSIGHT

Layout-Aware LLMs

  • DocLLM incorporates layout information by modeling text and spatial layout separately before fusing them.
  • This allows the model to learn disentangled representations, addressing limitations of previous approaches.
Get the Snipd Podcast app to discover more snips from this episode
Get the app