The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

How Deep Learning has Revolutionized OCR with Cha Zhang - #416

Oct 5, 2020
Cha Zhang, a Partner Engineering Manager at Microsoft Cloud & AI, dives into the exciting world of optical character recognition (OCR) technology. He discusses how deep learning is transforming OCR from handling basic documents to tackling complex image recognition challenges. Cha addresses traditional hurdles, like accuracy and localization, while exploring the benefits of semi-supervised approaches and neural architecture search. He also highlights innovative tools, like Form Recognizer, that enhance document processing, balancing automation with the essential need for human oversight.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

OCR's Evolution

  • OCR is no longer just for scanned documents.
  • Deep learning allows for highly accurate text recognition in images 'in the wild', like photos.
INSIGHT

Challenges of OCR in the Wild

  • OCR 'in the wild' faces challenges like scale variations, aspect ratios, and perspective distortions.
  • Backgrounds in photos complicate text recognition, especially those visually similar to characters.
INSIGHT

High IoU for OCR

  • Object detection uses Intersection over Union (IoU) to measure localization accuracy, aiming for at least 0.5.
  • OCR requires much higher IoU (0.9-0.95) to avoid missing characters and ensure accurate recognition.
Get the Snipd Podcast app to discover more snips from this episode
Get the app