

Right-Sizing AI: Small Language Models for Real-World Production
40 snips Sep 20, 2025
In this discussion, Steven Huels, VP of AI Engineering at Red Hat, unpacks the power of small language models (SLMs) for real-world applications. He highlights the advantages of SLMs in fitting onto single enterprise GPUs and their operational capabilities. The conversation dives into self-hosting models versus relying on APIs, tackles organizational readiness, and discusses innovations in agentic systems. Steven shares real-world examples like scam detection and emphasizes the importance of customization, automated evaluation, and continuous retraining for efficient AI deployment.
AI Snips
Chapters
Transcript
Episode notes
Practical GPU-Based Model Size Heuristic
- Define small vs large models by whether they fit on a single enterprise GPU rather than parameter count.
- This practical heuristic shifts as hardware and software advance, changing what counts as "small."
Validate With The Best Model First
- Start experiments with the best available frontier model to validate an idea quickly.
- If the idea has value, then scale down to smaller models to find the right cost-performance trade-off.
Match Hosting To Operational Maturity
- Evaluate whether your IT organization already runs platforms before self-hosting models.
- If not, consider an integrated AI platform to extend existing operational skills and reduce maintenance burden.