The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Multi-modal Deep Learning for Complex Document Understanding with Doug Burdick - #541

19 snips
Dec 2, 2021
Doug Burdick, a principal research staff member at IBM Research, specializes in making complex documents machine-readable. He discusses the fusion of NLP and computer vision to tackle PDF extraction challenges, especially for COVID-19 data. The conversation delves into innovative methods for effective table extraction and the importance of evaluation metrics to ensure accuracy. Doug emphasizes collaboration within research communities and the need for robust systems to advance understanding of complex, multi-modal data formats.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

CORD-19 Table Enhancement

  • The Allen Institute for AI's CORD-19 dataset, focusing on COVID-19 research, initially only offered plain text from PDFs.
  • IBM helped by adding table data, the most requested feature, improving data usability for researchers and the associated Kaggle challenge.
INSIGHT

PDFs as Images

  • PDFs are archival, stripping metadata upon saving, making data extraction challenging.
  • This necessitates treating PDFs like images, relying on visual clues for information retrieval.
INSIGHT

Multimodal Table Extraction

  • Identifying tables requires a multimodal approach, combining visual and linguistic cues.
  • Deep learning helps with initial identification, but NLP is crucial for complex table structure and language interpretation.
Get the Snipd Podcast app to discover more snips from this episode
Get the app