The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Snorkel: A System for Fast Training Data Creation with Alex Ratner - TWiML Talk #270

May 30, 2019
In this discussion, Alex Ratner, a Ph.D. student at Stanford and creator of Snorkel, dives into revolutionary data labeling techniques. He explains how Snorkel simplifies the creation of training data using weak supervised learning, transforming traditional methods. Ratner shares real-world applications, including collaborations with companies like Google. The conversation also addresses the complexities of labeling functions, the influence of human biases in machine learning, and exciting future advancements like Snorkel Metal for multitask learning.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Training Data Bottleneck

  • Deep learning models need lots of labeled training data, which is a bottleneck for their application.
  • Snorkel allows subject matter experts to provide higher-level inputs like rules or patterns to train these models.
ANECDOTE

Deep Dive and Memex

  • Deep Dive, Snorkel's predecessor, was used in the DARPA Memex project to combat human trafficking.
  • It addressed the challenge of extracting structured information from unstructured data like websites.
ANECDOTE

Clinician Hesitancy

  • Clinicians were hesitant to adopt new ML models due to the large labeled training sets required.
  • This highlighted the need for easier training data creation methods.
Get the Snipd Podcast app to discover more snips from this episode
Get the app