
Demis Hassabis on shipping momentum, better evals and world models
Google AI: Release Notes
00:00
Game Arena's Scaling
- Game Arena's tests get automatically harder as AI systems improve, unlike benchmarks like Amy or GPQA where humans must create increasingly difficult questions.
- The uniqueness of each game, created by players, benefits testing, as it prevents overfitting on training data.
Transcript
Play full episode