Greg Kamradt: Benchmarking Intelligence | ARC Prize

22 snips

Jun 24, 2025

Greg Kamradt, a mentor for AI-centric developers and an expert in launching tech products, dives into the nuances of AI benchmarking. He discusses the challenges in creating effective benchmarks, highlighting the intriguing concept of puzzles that are easy for humans but hard for AI. The conversation covers compute tradeoffs and the philosophical implications of tracking AI progress towards AGI. Greg also shares insights on motivating participants in AI competitions and the evolving ARC framework for assessing intelligence in innovative ways.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

INSIGHT

Human-Easy, AI-Hard Benchmarking

ARC AGI benchmarks focus on problems easy for humans but hard for AI to identify true AGI progress.
Human solvability of tasks with AI failure indicates no AGI yet exists.

ANECDOTE

Testing OpenAI’s Unreleased Model

OpenAI approached ArcPrize to test their new unreleased model on a hidden set revealing unreported performance.
The model scored 87%, far higher than public scores, validating unprecedented progress.

INSIGHT

Compute Tradeoff in AI Performance

Model performance scales with inference compute money and time, leading to a compute-performance tradeoff.
It's unclear if performance can reach 100%, given exponential compute cost growth and long runtimes.

Get the Snipd Podcast app to discover more snips from this episode

Get the app