LessWrong (Curated & Popular) cover image

LessWrong (Curated & Popular)

“METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman

Apr 7, 2025
Zach Stein-Perlman, author of a thought-provoking post on measuring AI task performance, discusses a groundbreaking metric for evaluating AI capabilities based on the length of tasks they can complete. He reveals that AI’s ability to tackle complex tasks has been doubling approximately every seven months for the last six years. The conversation highlights the implications of this rapid progress, the challenges AI still faces with longer tasks, and the urgency of preparing for a future where AI could autonomously handle significant work typically done by humans.
11:09

Podcast summary created with Snipd AI

Quick takeaways

  • AI agents are demonstrating exponential growth in task completion length, significantly affecting future automation potential across industries.
  • Despite advancements, the correlation between AI model performance and real-world task success remains complex and requires further exploration.

Deep dives

Measuring AI Task Completion

The measurement of AI performance through the length of tasks it can complete has shown significant trends over recent years. This approach indicates that generalist AI agents are evolving to handle increasingly lengthy tasks autonomously, with performance doubling roughly every seven months. For instance, while current models can reliably manage tasks lasting under four minutes, they struggle with longer tasks, achieving less than 10% success on challenges that take human professionals more than four hours. Understanding these metrics helps clarify the relationship between AI capabilities and real-world utility, emphasizing the need for quantifiable benchmarks.

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner