Gradient Dissent: Conversations on AI

The Startup Powering The Data Behind AGI

67 snips
Sep 16, 2025
Edwin Chen, CEO of Surge AI and a veteran ML engineer from Google, Twitter, and Facebook, dives into the challenges of quality data labeling in AI. He discusses Surge's innovative approach that focuses on high-skill data collection over traditional methods. Chen critiques the notion of inter-annotator agreement, especially in complex tasks like poetry and math. The conversation also touches on the role of high-quality data in advancing artificial general intelligence and the necessary evolution of data practices for future AI breakthroughs.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Early Craigslist Labeling Failure

  • Edwin described labeling at Twitter with two Craigslist hires who produced junk labels and forced him to label himself.
  • That experience motivated him to build a better data platform rather than trust ad-hoc spreadsheets and slow workers.
INSIGHT

Quality Over Commodity Labeling

  • Commodity labeling fits only low-skill tasks like bounding boxes where human variance is minimal.
  • High-complexity language and behavior tasks require a platform built for quality, not just scale.
ANECDOTE

Built From A Trusted Network

  • Edwin recruited an initial crowd from his existing network of people who've cared about data labeling for years.
  • Early customers were engineers and research scientists at companies like Airbnb and Twitter who demanded higher-quality data.
Get the Snipd Podcast app to discover more snips from this episode
Get the app