Last Week in AI cover image

#232 - ChatGPT Ads, Thinking Machines Drama, STEM

Last Week in AI

00:00

AgencyBench: Long-Horizon Agent Benchmark

Explanation of AgencyBench's million-token tasks, tool calls, and closed vs open model performance gap.

Play episode from 54:51
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app