Eye On A.I.

#298 Ryan Kolln: How Appen Trains the World's Most Powerful AI Models

18 snips
Nov 6, 2025
Ryan Kolln, CEO of Appen, discusses the critical role of human evaluation in training AI models. He explains why traditional benchmarks fall short, emphasizing the need for user-centered measures. Kolln highlights how curated human evaluators provide richer insights than random feedback, ensuring AI's cultural relevance through localized data. He also covers the evolution from supervised learning to large language model evaluations, and the synergy between AI evaluators and human annotators in enhancing quality control and model performance.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Benchmarks Are Limited Predictors

  • Benchmarks give a snapshot but often fail to predict real user experience for LLMs.
  • Narrow benchmarks miss edge cases and cultural nuance essential for global performance.
INSIGHT

Human Rubrics Drive Real Feedback

  • Human evaluators judge responses on accuracy, format, grammar, harm and bias using subjective rubrics.
  • Their written feedback often doubles as training data to retrain and improve models.
ANECDOTE

Parenting Advice Shows Cultural Gaps

  • Appen recruited parenting experts globally and found culturally diverse advice differs widely.
  • That work ensures models provide locally relevant guidance rather than a Western average.
Get the Snipd Podcast app to discover more snips from this episode
Get the app