AI Engineering Podcast

Right-Sizing AI: Small Language Models for Real-World Production

40 snips
Sep 20, 2025
In this discussion, Steven Huels, VP of AI Engineering at Red Hat, unpacks the power of small language models (SLMs) for real-world applications. He highlights the advantages of SLMs in fitting onto single enterprise GPUs and their operational capabilities. The conversation dives into self-hosting models versus relying on APIs, tackles organizational readiness, and discusses innovations in agentic systems. Steven shares real-world examples like scam detection and emphasizes the importance of customization, automated evaluation, and continuous retraining for efficient AI deployment.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Practical GPU-Based Model Size Heuristic

  • Define small vs large models by whether they fit on a single enterprise GPU rather than parameter count.
  • This practical heuristic shifts as hardware and software advance, changing what counts as "small."
ADVICE

Validate With The Best Model First

  • Start experiments with the best available frontier model to validate an idea quickly.
  • If the idea has value, then scale down to smaller models to find the right cost-performance trade-off.
ADVICE

Match Hosting To Operational Maturity

  • Evaluate whether your IT organization already runs platforms before self-hosting models.
  • If not, consider an integrated AI platform to extend existing operational skills and reduce maintenance burden.
Get the Snipd Podcast app to discover more snips from this episode
Get the app