

Snorkel: A System for Fast Training Data Creation with Alex Ratner - TWiML Talk #270
May 30, 2019
In this discussion, Alex Ratner, a Ph.D. student at Stanford and creator of Snorkel, dives into revolutionary data labeling techniques. He explains how Snorkel simplifies the creation of training data using weak supervised learning, transforming traditional methods. Ratner shares real-world applications, including collaborations with companies like Google. The conversation also addresses the complexities of labeling functions, the influence of human biases in machine learning, and exciting future advancements like Snorkel Metal for multitask learning.
AI Snips
Chapters
Transcript
Episode notes
Training Data Bottleneck
- Deep learning models need lots of labeled training data, which is a bottleneck for their application.
- Snorkel allows subject matter experts to provide higher-level inputs like rules or patterns to train these models.
Deep Dive and Memex
- Deep Dive, Snorkel's predecessor, was used in the DARPA Memex project to combat human trafficking.
- It addressed the challenge of extracting structured information from unstructured data like websites.
Clinician Hesitancy
- Clinicians were hesitant to adopt new ML models due to the large labeled training sets required.
- This highlighted the need for easier training data creation methods.