In an otherwise heavy week packed with Microsoft Build, Google I/O, and OpenAI io, the worst kept secret in biglab land was the launch of Claude 4, particularly the triumphant return of Opus, which many had been clamoring for. We will leave the specific Claude 4 recap to AINews, however we think that both Gemini’s progress on Deep Think this week and Claude 4 represent the next frontier of progress on inference time compute/reasoning (at last until GPT5 ships this summer).
Will Brown’s talk at AIE NYC and open source work on verifiers have made him one of the most prominent voices able to publicly discuss (aka without the vaguepoasting LoRA they put on you when you join a biglab) the current state of the art in reasoning models and where current SOTA research directions lead. We discussed his latest paper on Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment and he has previewed his AIEWF talk on Agentic RL for those with the temerity to power thru bad meetup audio.
Chapters
- 00:00 Introduction and Episode Overview
- 02:01 Discussion on Cloud 4 and its Features
- 04:31 Reasoning and Tool Use in AI Models
- 07:01 Extended Thinking in Claude and Model Differences
- 09:31 Speculation on Claude's Extended Thinking
- 11:01 Challenges and Controversies in AI Model Training
- 13:31 Technical Highlights and Code Trustworthiness
- 16:01 Token Costs and Incentives in AI Models
- 18:31 Thinking Budgets and AI Effort
- 21:01 Safety and Ethics in AI Model Development
- 23:31 Anthropic's Approach to AI Safety
- 26:01 LLM Arena and Evaluation Challenges
- 28:31 Developing Taste and Direction in AI Research
- 31:01 Recent Research and Multi-Turn RL
- 33:31 Tools and Incentives in AI Model Development
- 36:01 Challenges in Evaluating AI Model Outputs
- 38:31 Model-Based Rewards and Future Directions
- 41:01 Wrap-up and Future Plans