By Thomas Kwa et al.
We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.
Source:
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.