

Small Language Models are the Future of Agentic AI
Sep 5, 2025
Peter Belcak, an AI research scientist at NVIDIA, discusses his groundbreaking paper on the promise of small language models (SLMs) for agentic AI. He highlights how SLMs can outperform larger models in cost-effectiveness and operational efficiency. Peter explores the transformation process from large models to smaller agents and introduces tools supporting this fine-tuning. He also addresses bias mitigation in data selection and the importance of collaboration in the evolving landscape of AI, paving the way for a more accessible future.
AI Snips
Chapters
Transcript
Episode notes
SLMs Can Replace LLMs For Many Agent Tasks
- Small language models (SLMs) can handle many agent errands and match larger models for specific tasks.
- SLMs become the natural choice when balancing capability, operational fit, and cost.
Hardware Favors Smaller Models For Cost Efficiency
- Specialized inference chips amplify efficiency gains more for SLMs than for LLMs due to on-chip memory and signal constraints.
- That hardware advantage can translate into much lower per-token and per-query costs for SLMs.
Code Orchestration Often Drives Agent Workflows
- Distinguish model agency (ML deciding plans) from code agency (orchestration driving flow).
- Many enterprise agents already use code orchestration and only invoke LMs for narrow language errands.