

Reasoning Over Complex Documents with DocLLM with Armineh Nourbakhsh - #672
24 snips Feb 19, 2024
Armineh Nourbakhsh, Executive Director at JP Morgan AI Research, dives into the exciting world of DocLLM, a layout-aware large language model designed for document understanding. She shares insights on the evolution of document AI, focusing on multimodal approaches that combine textual and visual data. Nourbakhsh discusses the challenges of training generative models, the intricacies of processing enterprise documents, and strategies to reduce hallucinations in language models, enhancing performance in complex document analysis.
AI Snips
Chapters
Transcript
Episode notes
Document AI Challenge at S&P Global
- Armineh Nourbakhsh's first multimodal document AI challenge involved automating analysis of client documents at S&P Global.
- Credit rating analysts reviewed hundreds of pages, prompting AI automation.
Document AI: An Unsolved Problem
- Despite advancements, document AI remains an unsolved problem, especially in enterprise settings.
- Encoder-only architectures dominate, requiring frequent fine-tuning for new tasks and data distributions.
Layout-Aware LLMs
- DocLLM incorporates layout information by modeling text and spatial layout separately before fusing them.
- This allows the model to learn disentangled representations, addressing limitations of previous approaches.