

ARC Prize v2 Launch! (Francois Chollet and Mike Knoop)
162 snips Mar 24, 2025
Francois Chollet, an AI researcher known for Keras and the ARC challenge, joins Mike Knoop, collaborator on the ARC challenge, to launch the new version of the ARC prize. They discuss how ARC v2 integrates human calibration and adversarial selection, ensuring that even top LLMs struggle against it. The conversation highlights the evolution from ARC v1 to v2, the complexities of AI task design, and the urgent need for rigorous testing methods to bridge the gap between human and AI intelligence in the quest for artificial general intelligence.
AI Snips
Chapters
Transcript
Episode notes
O3's Surprise Performance
- OpenAI's O3 model achieved near-human performance on ARC v1, surprising Francois Chollet.
- This prompted a two-week testing sprint to understand O3's capabilities and implications.
Training on ARC
- Training on ARC's training set is not cheating; the benchmark encourages it to teach AI systems about the domain.
- The private data set tests generalization and abstraction, requiring more than memorization.
Human Calibration
- ARC v2 tasks were solvable by humans at a $5 per-task rate, within five minutes.
- Every v2 task is solvable by at least two humans under two attempts.