This Week in Startups cover image

Waymo Madness in SF! Why robotaxis clogged the streets | E2227

This Week in Startups

00:00

Where Benchmarks Fall Short for Agents

Jason suggests new evaluations: agent-style, real-world tasks and novel problem-solving tests for models.

Play episode from 42:46
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app