
AI #139: The Overreach Machines
Don't Worry About the Vase Podcast
00:00
Holistic Agent Leaderboard and Evaluation Findings
Zvi summarizes HAL evaluation methods and surprising results about reasoning effort and agent shortcuts.
Play episode from 09:54
Transcript


