

The Fallacy of "Ground Truth" with Shayan Mohanty - #576
14 snips May 30, 2022
Shayan Mohanty, CEO of Watchful and former Facebook systems architect, dives into the world of data-centric AI. He discusses the complexities of data labeling and the benefits of techniques like active learning and weak supervision to enhance efficiency. Shayan also explores the challenges organizations face with hand-labeled data and the biases that arise, emphasizing the need for a more integrated approach in machine learning operations. He shines a light on the critical mindset shift required to embrace this innovative strategy for better outcomes.
AI Snips
Chapters
Transcript
Episode notes
Big Data vs. Small Data
- Data-centric AI is about choosing between big data and small data approaches.
- Consider data diversity and potential bias when selecting a data size.
Active Learning for Small Data
- Use active learning to identify the most valuable data points for labeling.
- Train a model on labeled data and measure its uncertainty on unlabeled data.
Weak Supervision for Large Data
- Weak supervision uses labeling functions (e.g., regex) to label large datasets quickly.
- It's effective for common cases but struggles with long-tail data.