Last Week in AI cover image

#222 - Sora 2, Sonnet 4.5, Vibes, Thinking Machines

Last Week in AI

00:00

SWE-Bench Pro: New Long-Horizon Software Benchmark

They explain SWE-Bench Pro's design for realistic long-horizon engineering tasks and why stronger benchmarks matter for agentic coding.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app