[AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

976 snips

May 23, 2025

Will Brown, the reasoning research lead at Prime Intellect, shares his insights on the latest advancements in multi-turn reasoning for LLM agents. He discusses his recent paper on turn-level credit assignment, shedding light on the importance of practical AI agent applications. The conversation covers challenges in model training, ethical dilemmas, and managing token budgets for efficient performance. Brown also speculates on the future of AI safety and the evolving capabilities of models like Claude 4, diving into their real-world implications and complexities.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

00:00 / 00:00

Reasoning as a Step to Agents

The future of AI innovation lies in building practical agents, not just better reasoning models.
Reasoning improvements act as stepping stones towards more capable and autonomous AI agents.

00:00 / 00:00

Extended Thinking as Tool Use

Anthropics treats extended thinking as a form of tool use to enhance model problem solving.
The model uses thinking as a way to decide actions, akin to a brain dump helping next steps.

00:00 / 00:00

Claude 4 Progress and Trustworthiness

Claude 4 shows linear progress without a paradigm shift but improves on reducing reward hacking.
Better adherence to task and less extraneous output improve coding trustworthiness in newer models.

Get the Snipd Podcast app to discover more snips from this episode

Get the app