
AI #135: OpenAI Shows Us The Money
Don't Worry About the Vase Podcast
00:00
SWE Bench Pro Raises Coding Agent Bar
Introduction of SWE Bench Pro and how it tests real enterprise-grade multi-file coding tasks revealing low current agent performance.
Transcript
Play full episode