The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Reasoning Over Complex Documents with DocLLM with Armineh Nourbakhsh - #672

24 snips

Feb 19, 2024

Armineh Nourbakhsh, Executive Director at JP Morgan AI Research, dives into the exciting world of DocLLM, a layout-aware large language model designed for document understanding. She shares insights on the evolution of document AI, focusing on multimodal approaches that combine textual and visual data. Nourbakhsh discusses the challenges of training generative models, the intricacies of processing enterprise documents, and strategies to reduce hallucinations in language models, enhancing performance in complex document analysis.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Document AI Challenge at S&P Global

Armineh Nourbakhsh's first multimodal document AI challenge involved automating analysis of client documents at S&P Global.
Credit rating analysts reviewed hundreds of pages, prompting AI automation.

INSIGHT

Document AI: An Unsolved Problem

Despite advancements, document AI remains an unsolved problem, especially in enterprise settings.
Encoder-only architectures dominate, requiring frequent fine-tuning for new tasks and data distributions.

INSIGHT

Layout-Aware LLMs

DocLLM incorporates layout information by modeling text and spatial layout separately before fusing them.
This allows the model to learn disentangled representations, addressing limitations of previous approaches.

Get the Snipd Podcast app to discover more snips from this episode

Get the app