The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Multi-modal Deep Learning for Complex Document Understanding with Doug Burdick - #541

19 snips

Dec 2, 2021

Doug Burdick, a principal research staff member at IBM Research, specializes in making complex documents machine-readable. He discusses the fusion of NLP and computer vision to tackle PDF extraction challenges, especially for COVID-19 data. The conversation delves into innovative methods for effective table extraction and the importance of evaluation metrics to ensure accuracy. Doug emphasizes collaboration within research communities and the need for robust systems to advance understanding of complex, multi-modal data formats.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

CORD-19 Table Enhancement

The Allen Institute for AI's CORD-19 dataset, focusing on COVID-19 research, initially only offered plain text from PDFs.
IBM helped by adding table data, the most requested feature, improving data usability for researchers and the associated Kaggle challenge.

INSIGHT

PDFs as Images

PDFs are archival, stripping metadata upon saving, making data extraction challenging.
This necessitates treating PDFs like images, relying on visual clues for information retrieval.

INSIGHT

Multimodal Table Extraction

Identifying tables requires a multimodal approach, combining visual and linguistic cues.
Deep learning helps with initial identification, but NLP is crucial for complex table structure and language interpretation.

Get the Snipd Podcast app to discover more snips from this episode

Get the app