LocoBench: Long-Context Software Engineering Benchmark

Andrey describes LocoBench's eight long-context SE tasks and new metrics; Michelle emphasizes realistic benchmarks for code tasks.

Play episode from 24:32

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Our 221st episode with a summary and discussion of last week's big AI news!

Recorded on 09/19/2025

Note: we transitioned to a new RSS feed and it seems this did not make it to there, so this may be posted about 2 weeks past the release date.

Hosted by Andrey Kurenkov and co-hosted by Michelle Lee

Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/

In this episode:

OpenAI releases a new version of Codex integrated with GPT-5, enhancing coding capabilities and aiming to compete with other AI coding tools like Cloud Code.
Significant updates in the robotics sector include new ventures in humanoid robots from companies like Figure AI and China’s Unitree, as well as expansions in robotaxi services from Tesla and Amazon’s Zoox.
New open-source models and research advancements were discussed, including Google's DeepMind's self-improving foundation model for robotics and a physics foundation model aimed at generalizing across various physical systems.
Legal battles continue to surface in the AI landscape with Warner Bros. suing MidJourney for copyright violations and Rolling Stone suing Google over AI-generated content summaries, highlighting challenges in AI governance and ethics.

Timestamps:

(00:00:10) Intro / Banter
Tools & Apps
(00:02:33) OpenAI upgrades Codex with a new version of GPT-5
(00:04:02) Google Injects Gemini Into Chrome as AI Browsers Go Mainstream | WIRED
(00:06:14) Anthropic’s Claude can now make you a spreadsheet or slide deck. | The Verge
(00:07:12) Luma AI's New Ray3 Video Generator Can 'Think' Before Creating - CNET
Applications & Business
(00:08:32) OpenAI secures Microsoft's blessing to transition its for-profit arm | TechCrunch
(00:10:31) Microsoft to lessen reliance on OpenAI by buying AI from rival Anthropic | TechCrunch
(00:12:00) Figure AI passes $1B with Series C funding toward humanoid robot development - The Robot Report
(00:13:52) China’s Unitree plans $7 billion IPO valuation as humanoid robot race heats up
(00:15:45) Tesla's robotaxi plans for Nevada move forward with testing permit | TechCrunch
(00:17:48) Amazon's Zoox jumps into U.S. robotaxi race with Las Vegas launch
(00:19:27) Replit hits $3B valuation on $150M annualized revenue | TechCrunch
(00:21:14) Perplexity reportedly raised $200M at $20B valuation | TechCrunch
Projects & Open Source
(00:22:08) [2509.07604] K2-Think: A Parameter-Efficient Reasoning System
(00:24:31) [2509.09614] LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Research & Advancements
(00:28:17) [2509.15155] Self-Improving Embodied Foundation Models
(00:31:47) [2509.13805] Towards a Physics Foundation Model
(00:34:26) [2509.12129] Embodied Navigation Foundation Model
Policy & Safety
(00:37:49) Anthropic endorses California's AI safety bill, SB 53 | TechCrunch
(00:40:12) Warner Bros. Sues Midjourney, Joins Studios' AI Copyright Battle
(00:42:02) Rolling Stone Publisher Sues Google Over AI Overview Summaries

See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books