Deep Papers

TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture

10 snips
Nov 24, 2025
Yongchao Chen, a final-year PhD student at Harvard and MIT, discusses his groundbreaking work on TUMIX (Tool-Use Mixture). He explains how a diverse ensemble of agents can significantly improve AI's accuracy by leveraging different tool-use strategies. Chen highlights the limitations of current models, which often struggle to decide when to use tools effectively. Through empirical tests, he shares remarkable results where TUMIX outperforms state-of-the-art methods, emphasizing the importance of agent diversity and collaborative refinement for enhancing AI performance.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Models Don't Automatically Pick The Right Tool

  • Large models often fail to choose the right tool (code vs. text) without explicit hints.
  • Tool availability alone doesn't guarantee models will use tools effectively.
ANECDOTE

Code Execution Works—but Models Still Overconfident

  • Chen shows examples where Claude generates code and gets correct results, while its direct textual answers are wrong.
  • This demonstrates models can execute tools correctly but still overconfidently answer without using them.
INSIGHT

Parallel Diverse Agents With Iterative Refinement

  • TUMIX runs many pre-designed agents in parallel, each with different tool-use strategies, then iteratively shares and refines answers.
  • Round-by-round exchange raises group accuracy as agents converge on better solutions.
Get the Snipd Podcast app to discover more snips from this episode
Get the app