John Yang

Researcher and benchmark author focused on code evaluation and long-horizon AI coding agents; creator of SWE-bench and CodeClash and a Stanford PhD student working on human–AI collaboration and code evals.

Best podcasts with John Yang

Ranked by the Snipd community

318 snips

Dec 31, 2025 • 18min

[State of Code Evals] After SWE-bench, Code Clash & SOTA Coding Benchmarks recap — John Yang

Join John Yang, a Stanford PhD student and the mind behind SWE-bench and CodeClash, as he shares insights from the cutting-edge world of AI coding benchmarks. Discover how SWE-bench went from zero to industry standard in mere months, the limitations of traditional unit tests, and the innovative long-horizon tournaments of CodeClash. Yang dives into the debate around Tau-bench's 'impossible tasks' and explores the balance between autonomous agents and interactive workflows. Get ready for a glimpse into the future of human-AI collaboration!

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app