Unsupervised Learning with Jacob Effron

Ep 80: CEO of Surge AI Edwin Chen on Why Frontier Labs Are Diverging, RL Environments & Developing Model Taste

121 snips
Dec 15, 2025
Edwin Chen, Founder and CEO of Surge AI, shares insights from his data infrastructure company supporting major AI labs like OpenAI and Meta. He discusses the pitfalls of optimizing for clickbait benchmarks, revealing how these practices mislead model quality. Chen emphasizes the importance of rigorous human evaluations over gaming benchmarks, and he critiques Silicon Valley's pivot culture. The conversation delves into the diversity of AI training approaches, advocating for multiple opinionated models tailored to specific needs in future AI development.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Leaderboard Optimization Drives Clickbait Behavior

  • Optimizing for public leaderboards like LMArena produces flashy but shallow wins that users prefer.
  • Models trained this way become longer, attention-grabbing and prone to hallucination rather than genuinely more accurate.
ANECDOTE

A Team's Model Regressed Without Anyone Noticing

  • Edwin recounts investigating a team's models and finding regression over six to twelve months due to poor data and raters not executing code.
  • The team's training data contained flowery, incorrect code that raters never actually ran or validated.
ADVICE

Use Rigorous Human Evals, Not Quick Votes

  • Run rigorous human evaluations with expert, attentive raters to measure real-world model quality.
  • Use diverse, sophisticated prompts and trusted raters instead of anonymous quick votes to avoid Goodharting on benchmarks.
Get the Snipd Podcast app to discover more snips from this episode
Get the app