Will Brown, the reasoning research lead at Prime Intellect, shares his insights on the latest advancements in multi-turn reasoning for LLM agents. He discusses his recent paper on turn-level credit assignment, shedding light on the importance of practical AI agent applications. The conversation covers challenges in model training, ethical dilemmas, and managing token budgets for efficient performance. Brown also speculates on the future of AI safety and the evolving capabilities of models like Claude 4, diving into their real-world implications and complexities.
39:57
forum Ask episode
web_stories AI Snips
view_agenda Chapters
menu_book Books
auto_awesome Transcript
info_circle Episode notes
00:00 / 00:00
Reasoning as a Step to Agents
The future of AI innovation lies in building practical agents, not just better reasoning models.
Reasoning improvements act as stepping stones towards more capable and autonomous AI agents.
00:00 / 00:00
Extended Thinking as Tool Use
Anthropics treats extended thinking as a form of tool use to enhance model problem solving.
The model uses thinking as a way to decide actions, akin to a brain dump helping next steps.
00:00 / 00:00
Claude 4 Progress and Trustworthiness
Claude 4 shows linear progress without a paradigm shift but improves on reducing reward hacking.
Better adherence to task and less extraneous output improve coding trustworthiness in newer models.
Get the Snipd Podcast app to discover more snips from this episode
In 'Lord of the Flies', William Golding tells the story of a group of British schoolboys who are stranded on a deserted island after their plane crashes. The novel follows their attempts to govern themselves and the gradual descent into chaos and savagery. The story is an allegory that explores themes of human nature, morality, leadership, and the fragility of civilization. Key characters include Ralph, who represents order and democracy; Jack, who symbolizes power and violence; and Piggy, the voice of reason. The novel highlights the tension between the desire for civilization and the primal savagery that lies beneath the surface of human society.
In an otherwise heavy week packed with Microsoft Build, Google I/O, and OpenAI io, the worst kept secret in biglab land was the launch of Claude 4, particularly the triumphant return of Opus, which many had been clamoring for. We will leave the specific Claude 4 recap to AINews, however we think that both Gemini’s progress on Deep Think this week and Claude 4 represent the next frontier of progress on inference time compute/reasoning (at last until GPT5 ships this summer).
Will Brown’s talk at AIE NYC and open source work on verifiers have made him one of the most prominent voices able to publicly discuss (aka without the vaguepoasting LoRA they put on you when you join a biglab) the current state of the art in reasoning models and where current SOTA research directions lead. We discussed his latest paper on Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment and he has previewed his AIEWF talk on Agentic RL for those with the temerity to power thru bad meetup audio.
Chapters
00:00 Introduction and Episode Overview
02:01 Discussion on Cloud 4 and its Features
04:31 Reasoning and Tool Use in AI Models
07:01 Extended Thinking in Claude and Model Differences
09:31 Speculation on Claude's Extended Thinking
11:01 Challenges and Controversies in AI Model Training
13:31 Technical Highlights and Code Trustworthiness
16:01 Token Costs and Incentives in AI Models
18:31 Thinking Budgets and AI Effort
21:01 Safety and Ethics in AI Model Development
23:31 Anthropic's Approach to AI Safety
26:01 LLM Arena and Evaluation Challenges
28:31 Developing Taste and Direction in AI Research
31:01 Recent Research and Multi-Turn RL
33:31 Tools and Incentives in AI Model Development