Latent Space: The AI Engineer Podcast

[AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

932 snips
May 23, 2025
Will Brown, the reasoning research lead at Prime Intellect, shares his insights on the latest advancements in multi-turn reasoning for LLM agents. He discusses his recent paper on turn-level credit assignment, shedding light on the importance of practical AI agent applications. The conversation covers challenges in model training, ethical dilemmas, and managing token budgets for efficient performance. Brown also speculates on the future of AI safety and the evolving capabilities of models like Claude 4, diving into their real-world implications and complexities.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
00:00 / 00:00

Reasoning as a Step to Agents

  • The future of AI innovation lies in building practical agents, not just better reasoning models.
  • Reasoning improvements act as stepping stones towards more capable and autonomous AI agents.
00:00 / 00:00

Extended Thinking as Tool Use

  • Anthropics treats extended thinking as a form of tool use to enhance model problem solving.
  • The model uses thinking as a way to decide actions, akin to a brain dump helping next steps.
00:00 / 00:00

Claude 4 Progress and Trustworthiness

  • Claude 4 shows linear progress without a paradigm shift but improves on reducing reward hacking.
  • Better adherence to task and less extraneous output improve coding trustworthiness in newer models.
Get the Snipd Podcast app to discover more snips from this episode
Get the app