The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The Fallacy of "Ground Truth" with Shayan Mohanty - #576

26 snips

May 30, 2022

Shayan Mohanty, CEO of Watchful and former Facebook systems architect, dives into the world of data-centric AI. He discusses the complexities of data labeling and the benefits of techniques like active learning and weak supervision to enhance efficiency. Shayan also explores the challenges organizations face with hand-labeled data and the biases that arise, emphasizing the need for a more integrated approach in machine learning operations. He shines a light on the critical mindset shift required to embrace this innovative strategy for better outcomes.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Big Data vs. Small Data

Data-centric AI is about choosing between big data and small data approaches.
Consider data diversity and potential bias when selecting a data size.

ADVICE

Active Learning for Small Data

Use active learning to identify the most valuable data points for labeling.
Train a model on labeled data and measure its uncertainty on unlabeled data.

INSIGHT

Weak Supervision for Large Data

Weak supervision uses labeling functions (e.g., regex) to label large datasets quickly.
It's effective for common cases but struggles with long-tail data.

Get the Snipd Podcast app to discover more snips from this episode

Get the app