MLOps.community

Greg Kamradt: Benchmarking Intelligence | ARC Prize

11 snips
Jun 24, 2025
Greg Kamradt, a mentor for AI-centric developers and an expert in launching tech products, dives into the nuances of AI benchmarking. He discusses the challenges in creating effective benchmarks, highlighting the intriguing concept of puzzles that are easy for humans but hard for AI. The conversation covers compute tradeoffs and the philosophical implications of tracking AI progress towards AGI. Greg also shares insights on motivating participants in AI competitions and the evolving ARC framework for assessing intelligence in innovative ways.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

Human-Easy, AI-Hard Benchmarking

  • ARC AGI benchmarks focus on problems easy for humans but hard for AI to identify true AGI progress.
  • Human solvability of tasks with AI failure indicates no AGI yet exists.
ANECDOTE

Testing OpenAI’s Unreleased Model

  • OpenAI approached ArcPrize to test their new unreleased model on a hidden set revealing unreported performance.
  • The model scored 87%, far higher than public scores, validating unprecedented progress.
INSIGHT

Compute Tradeoff in AI Performance

  • Model performance scales with inference compute money and time, leading to a compute-performance tradeoff.
  • It's unclear if performance can reach 100%, given exponential compute cost growth and long runtimes.
Get the Snipd Podcast app to discover more snips from this episode
Get the app