Latent Space: The AI Engineer Podcast

Latent.Space
undefined
527 snips
Feb 6, 2026 • 1h 8min

The First Mechanistic Interpretability Frontier Lab — Myra Deng & Mark Bissell of Goodfire AI

Myra Deng, Head of Product at Goodfire AI who turns interpretability research into production, and Mark Bissell, mechanistic interpretability engineer with Palantir roots, discuss making model internals actionable. They cover lightweight probes, token-level safety filters, real-time steering of huge models, post-training surgical edits, and applying these tools across language, vision, and genomics.
undefined
447 snips
Jan 28, 2026 • 1h 14min

🔬 Automating Science: World Models, Scientific Taste, Agent Loops — Andrew White

Andrew White, former professor turned AI-for-science entrepreneur who co-founded Future House and Edison Scientific. He recounts building ChemCrow and Cosmos, red-teaming GPT-4 for chemistry, and automating hypothesis-to-experiment loops. Topics include scientific taste and why RLHF failed, world models as distilled scientific memory, lab-in-the-loop bottlenecks, and safety/dual-use tradeoffs.
undefined
517 snips
Jan 23, 2026 • 1h 32min

Captaining IMO Gold, Deep Think, On-Policy RL, Feeling the AGI in Singapore — Yi Tay

Yi Tay, a DeepMind researcher who co-led the IMO Gold project and built the Reasoning & AGI team in Singapore. He recounts training Gemini Deep Think, the live IMO Gold push, the shift from symbolic systems to end-to-end RL, debates on on-policy versus off-policy learning, and the role of self-consistency and data efficiency in unlocking reasoning.
undefined
1,295 snips
Jan 17, 2026 • 1h 13min

Brex’s AI Hail Mary — With CTO James Reggio

James Reggio, CTO of Brex and leader of their AI transformation, shares his journey from mobile engineer to fintech innovator. He discusses Brex's unique three-pillar AI strategy aimed at enhancing corporate workflows, operational compliance, and customer-oriented product features. Reggio reveals how SOP-driven agents outperform traditional reinforcement learning in automating processes like KYC and underwriting. He emphasizes empowering employees to create their own AI tools and the advantages of a multi-agent network architecture in financial operations.
undefined
652 snips
Jan 8, 2026 • 1h 18min

Artificial Analysis: Independent LLM Evals as a Service — with George Cameron and Micah-Hill Smith

Join George Cameron, co-founder of Artificial Analysis and benchmarking guru, along with Micah Hill-Smith, who crafted the evaluation methodology and unique benchmarks. They share their journey from a basement project to a vital tool for AI model assessment. Discover why independent evaluations matter, how their 'mystery shopper' strategy keeps benchmarks honest, and the innovative Omniscience index that prioritizes accurate responses. Learn about the evolving AI landscape and their predictions for future developments in benchmarking and transparency.
undefined
528 snips
Jan 6, 2026 • 24min

[State of Evals] LMArena's $1.7B Vision — Anastasios Angelopoulos, LMArena

Anastasios Angelopoulos, founder of LMArena, shares his journey from a Berkeley basement to a $100M valuation. He discusses why they chose to spin out as a company to scale their mission. The conversation dives into Arena's innovative approach to benchmarking AI models, the transparency of their public leaderboard, and their responses to critiques. Anastasios also reveals plans for expanding into new verticals like medicine and legal, the significance of community engagement, and the exciting shift to multimodal arenas.
undefined
556 snips
Jan 2, 2026 • 28min

[NeurIPS Best Paper] 1000 Layer Networks for Self-Supervised RL — Kevin Wang et al, Princeton

Kevin Wang, an undergraduate researcher at Princeton, and Ishaan Javali, his co-author, discuss their groundbreaking work on scaling reinforcement learning networks to 1,000 layers deep, a feat previously deemed impossible. They dive into the shift from traditional reward maximization to self-supervised learning methods, highlighting architectural breakthroughs like residual connections. The duo also explores efficiency trade-offs, data collection techniques using JAX, and the implications for robotics, positioning their approach as a radical shift in reinforcement learning objectives.
undefined
387 snips
Dec 31, 2025 • 18min

[State of Code Evals] After SWE-bench, Code Clash & SOTA Coding Benchmarks recap — John Yang

Join John Yang, a Stanford PhD student and the mind behind SWE-bench and CodeClash, as he shares insights from the cutting-edge world of AI coding benchmarks. Discover how SWE-bench went from zero to industry standard in mere months, the limitations of traditional unit tests, and the innovative long-horizon tournaments of CodeClash. Yang dives into the debate around Tau-bench's 'impossible tasks' and explores the balance between autonomous agents and interactive workflows. Get ready for a glimpse into the future of human-AI collaboration!
undefined
428 snips
Dec 31, 2025 • 28min

[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI

In this engaging discussion, Josh McGrath, a post-training researcher at OpenAI, dives into the evolution of AI models from GPT-4.1 to GPT-5.1. He highlights the importance of data quality over optimization methods and explains why RLHF and RLVR are simply variations of policy gradients. Josh also shares insights on how the shopping model enhances user experience with personality toggles and the complexities involved in scaling reinforcement learning. His call for more engineers proficient in both distributed systems and ML further emphasizes the need for interdisciplinary expertise in advancing AI.
undefined
562 snips
Dec 30, 2025 • 45min

[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor

In this engaging discussion, Ashvin Nair, a researcher with a rich background in robotics and AI, shares his journey from OpenAI to Cursor. He highlights the transition from robotic challenges to the quicker impact of language models. Ashvin delves into the economic dynamics of LLMs, the importance of co-designing models and products, and the complexities of continual learning. He also explores the limitations of scaling and the need for specialized models, offering insights into the future of coding automation and the evolving landscape of AI.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app