
BlueDot Narrated Measuring AI Ability to Complete Long Tasks
Sep 9, 2025
15:06
Audio versions of blogs and papers from BlueDot courses.
By Thomas Kwa et al.
We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.
Source:
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
A podcast by BlueDot Impact.
