

Latent Space: The AI Engineer Podcast
Latent.Space
The podcast by and for AI Engineers! In 2025, over 10 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0.
We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al.
Full show notes always on https://latent.space www.latent.space
We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al.
Full show notes always on https://latent.space www.latent.space
Episodes
Mentioned books

527 snips
Feb 6, 2026 • 1h 8min
The First Mechanistic Interpretability Frontier Lab — Myra Deng & Mark Bissell of Goodfire AI
Myra Deng, Head of Product at Goodfire AI who turns interpretability research into production, and Mark Bissell, mechanistic interpretability engineer with Palantir roots, discuss making model internals actionable. They cover lightweight probes, token-level safety filters, real-time steering of huge models, post-training surgical edits, and applying these tools across language, vision, and genomics.

447 snips
Jan 28, 2026 • 1h 14min
🔬 Automating Science: World Models, Scientific Taste, Agent Loops — Andrew White
Andrew White, former professor turned AI-for-science entrepreneur who co-founded Future House and Edison Scientific. He recounts building ChemCrow and Cosmos, red-teaming GPT-4 for chemistry, and automating hypothesis-to-experiment loops. Topics include scientific taste and why RLHF failed, world models as distilled scientific memory, lab-in-the-loop bottlenecks, and safety/dual-use tradeoffs.

517 snips
Jan 23, 2026 • 1h 32min
Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay
Yi Tay, a DeepMind researcher who co-led the IMO Gold project and built the Reasoning & AGI team in Singapore. He recounts training Gemini Deep Think, the live IMO Gold push, the shift from symbolic systems to end-to-end RL, debates on on-policy versus off-policy learning, and the role of self-consistency and data efficiency in unlocking reasoning.

1,295 snips
Jan 17, 2026 • 1h 13min
Brex’s AI Hail Mary — With CTO James Reggio
James Reggio, CTO of Brex and leader of their AI transformation, shares his journey from mobile engineer to fintech innovator. He discusses Brex's unique three-pillar AI strategy aimed at enhancing corporate workflows, operational compliance, and customer-oriented product features. Reggio reveals how SOP-driven agents outperform traditional reinforcement learning in automating processes like KYC and underwriting. He emphasizes empowering employees to create their own AI tools and the advantages of a multi-agent network architecture in financial operations.

652 snips
Jan 8, 2026 • 1h 18min
Artificial Analysis: Independent LLM Evals as a Service — with George Cameron and Micah-Hill Smith
Join George Cameron, co-founder of Artificial Analysis and benchmarking guru, along with Micah Hill-Smith, who crafted the evaluation methodology and unique benchmarks. They share their journey from a basement project to a vital tool for AI model assessment. Discover why independent evaluations matter, how their 'mystery shopper' strategy keeps benchmarks honest, and the innovative Omniscience index that prioritizes accurate responses. Learn about the evolving AI landscape and their predictions for future developments in benchmarking and transparency.

528 snips
Jan 6, 2026 • 24min
[State of Evals] LMArena's $1.7B Vision — Anastasios Angelopoulos, LMArena
Anastasios Angelopoulos, founder of LMArena, shares his journey from a Berkeley basement to a $100M valuation. He discusses why they chose to spin out as a company to scale their mission. The conversation dives into Arena's innovative approach to benchmarking AI models, the transparency of their public leaderboard, and their responses to critiques. Anastasios also reveals plans for expanding into new verticals like medicine and legal, the significance of community engagement, and the exciting shift to multimodal arenas.

556 snips
Jan 2, 2026 • 28min
[NeurIPS Best Paper] 1000 Layer Networks for Self-Supervised RL — Kevin Wang et al, Princeton
Kevin Wang, an undergraduate researcher at Princeton, and Ishaan Javali, his co-author, discuss their groundbreaking work on scaling reinforcement learning networks to 1,000 layers deep, a feat previously deemed impossible. They dive into the shift from traditional reward maximization to self-supervised learning methods, highlighting architectural breakthroughs like residual connections. The duo also explores efficiency trade-offs, data collection techniques using JAX, and the implications for robotics, positioning their approach as a radical shift in reinforcement learning objectives.

387 snips
Dec 31, 2025 • 18min
[State of Code Evals] After SWE-bench, Code Clash & SOTA Coding Benchmarks recap — John Yang
Join John Yang, a Stanford PhD student and the mind behind SWE-bench and CodeClash, as he shares insights from the cutting-edge world of AI coding benchmarks. Discover how SWE-bench went from zero to industry standard in mere months, the limitations of traditional unit tests, and the innovative long-horizon tournaments of CodeClash. Yang dives into the debate around Tau-bench's 'impossible tasks' and explores the balance between autonomous agents and interactive workflows. Get ready for a glimpse into the future of human-AI collaboration!

428 snips
Dec 31, 2025 • 28min
[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI
In this engaging discussion, Josh McGrath, a post-training researcher at OpenAI, dives into the evolution of AI models from GPT-4.1 to GPT-5.1. He highlights the importance of data quality over optimization methods and explains why RLHF and RLVR are simply variations of policy gradients. Josh also shares insights on how the shopping model enhances user experience with personality toggles and the complexities involved in scaling reinforcement learning. His call for more engineers proficient in both distributed systems and ML further emphasizes the need for interdisciplinary expertise in advancing AI.

562 snips
Dec 30, 2025 • 45min
[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor
In this engaging discussion, Ashvin Nair, a researcher with a rich background in robotics and AI, shares his journey from OpenAI to Cursor. He highlights the transition from robotic challenges to the quicker impact of language models. Ashvin delves into the economic dynamics of LLMs, the importance of co-designing models and products, and the complexities of continual learning. He also explores the limitations of scaling and the need for specialized models, offering insights into the future of coding automation and the evolving landscape of AI.


