

Greg Kamradt: Benchmarking Intelligence | ARC Prize
11 snips Jun 24, 2025
Greg Kamradt, a mentor for AI-centric developers and an expert in launching tech products, dives into the nuances of AI benchmarking. He discusses the challenges in creating effective benchmarks, highlighting the intriguing concept of puzzles that are easy for humans but hard for AI. The conversation covers compute tradeoffs and the philosophical implications of tracking AI progress towards AGI. Greg also shares insights on motivating participants in AI competitions and the evolving ARC framework for assessing intelligence in innovative ways.
AI Snips
Chapters
Books
Transcript
Episode notes
Human-Easy, AI-Hard Benchmarking
- ARC AGI benchmarks focus on problems easy for humans but hard for AI to identify true AGI progress.
- Human solvability of tasks with AI failure indicates no AGI yet exists.
Testing OpenAI’s Unreleased Model
- OpenAI approached ArcPrize to test their new unreleased model on a hidden set revealing unreported performance.
- The model scored 87%, far higher than public scores, validating unprecedented progress.
Compute Tradeoff in AI Performance
- Model performance scales with inference compute money and time, leading to a compute-performance tradeoff.
- It's unclear if performance can reach 100%, given exponential compute cost growth and long runtimes.