The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The Fallacy of "Ground Truth" with Shayan Mohanty - #576

14 snips
May 30, 2022
Shayan Mohanty, CEO of Watchful and former Facebook systems architect, dives into the world of data-centric AI. He discusses the complexities of data labeling and the benefits of techniques like active learning and weak supervision to enhance efficiency. Shayan also explores the challenges organizations face with hand-labeled data and the biases that arise, emphasizing the need for a more integrated approach in machine learning operations. He shines a light on the critical mindset shift required to embrace this innovative strategy for better outcomes.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Big Data vs. Small Data

  • Data-centric AI is about choosing between big data and small data approaches.
  • Consider data diversity and potential bias when selecting a data size.
ADVICE

Active Learning for Small Data

  • Use active learning to identify the most valuable data points for labeling.
  • Train a model on labeled data and measure its uncertainty on unlabeled data.
INSIGHT

Weak Supervision for Large Data

  • Weak supervision uses labeling functions (e.g., regex) to label large datasets quickly.
  • It's effective for common cases but struggles with long-tail data.
Get the Snipd Podcast app to discover more snips from this episode
Get the app